bigdata - Apache Spark vs. Apache Storm -
what differences between apache spark , apache storm? suitable use cases each one?
apache spark in-memory distributed data analysis platform-- targeted @ speeding batch analysis jobs, iterative machine learning jobs, interactive query , graph processing.
one of spark's primary distinctions use of rdds or resilient distributed datasets. rdds great pipelining parallel operators computation , are, definition, immutable, allows spark unique form of fault tolerance based on lineage information. if interested in, example, executing hadoop mapreduce job faster, spark great option (although memory requirements must considered).
apache storm focused on stream processing or call complex event processing. storm implements fault tolerant method performing computation or pipelining multiple computations on event flows system. 1 might use storm transform unstructured data flows system desired format.
storm , spark focused on different use cases. more "apples-to-apples" comparison between storm trident , spark streaming. since spark's rdds inherently immutable, spark streaming implements method "batching" incoming updates in user-defined time intervals transformed own rdds. spark's parallel operators can perform computations on these rdds. different storm deals each event individually.
one key difference between these 2 technologies spark performs data-parallel computations while storm performs task-parallel computations. either design makes tradeoffs worth knowing. suggest checking out these links.
edit: discovered this today
Comments
Post a Comment