bigdata - Apache Spark vs. Apache Storm -

- April 15, 2010

what differences between apache spark , apache storm? suitable use cases each one?

apache spark in-memory distributed data analysis platform-- targeted @ speeding batch analysis jobs, iterative machine learning jobs, interactive query , graph processing.

one of spark's primary distinctions use of rdds or resilient distributed datasets. rdds great pipelining parallel operators computation , are, definition, immutable, allows spark unique form of fault tolerance based on lineage information. if interested in, example, executing hadoop mapreduce job faster, spark great option (although memory requirements must considered).

apache storm focused on stream processing or call complex event processing. storm implements fault tolerant method performing computation or pipelining multiple computations on event flows system. 1 might use storm transform unstructured data flows system desired format.

storm , spark focused on different use cases. more "apples-to-apples" comparison between storm trident , spark streaming. since spark's rdds inherently immutable, spark streaming implements method "batching" incoming updates in user-defined time intervals transformed own rdds. spark's parallel operators can perform computations on these rdds. different storm deals each event individually.

one key difference between these 2 technologies spark performs data-parallel computations while storm performs task-parallel computations. either design makes tradeoffs worth knowing. suggest checking out these links.

edit: discovered this today

Search This Blog

Backgorund

bigdata - Apache Spark vs. Apache Storm -

Comments

Post a Comment

Popular posts from this blog

database - VFP Grid + SQL server 2008 - grid not showing correctly -

jquery - Set jPicker field to empty value -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -