实时计算框架 Spark: Lightning-fast cluster computing

大兔子大兔子 提交于 2019-12-15 21:38:25

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>>

real-time processing framework 实时计算框架 

Real-time processing denotes processing, transforming and analyzing data on the fly

  1. Spark:  Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing.
  2. Spark Stream: 实时流数据处理器(跟apache storm一样?distributed realtime computation system.   spark stream vs apache storm 两者的一篇比较文章

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like mapreducejoin and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

Spark Streaming

Internally, it works as follows. Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches.

Spark Streaming

 

Spark Stream与Apache Storm的区别:

One key difference between these two frameworks is that Spark performs Data-Parallel computations while Storm performs Task-Parallel computations. More similarities and differences are given in the table below.

StormSparkTable1

 

参考 

Stream Processing and Lambda Architecture Challenges

S4, Storm – When, What and How to choose.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!