Spark Streaming: How Spark and Kafka communication happens?

痴心易碎 提交于 2021-02-11 07:46:09

问题


I would like to understand how the communication between the Kafka and Spark(Streaming) nodes takes place. I have the following questions.

  1. If Kafka servers and Spark nodes are in two separate clusters how would be communications takes place. What are the steps need to configure them.
  2. If both are in same clusters but are in different nodes, how will be communication happens.

communication i mean here is whether it is a RPC or Socket communication. I would like to understand the internal anatomy

Any help appreciated.

Thanks in Advance.


回答1:


First of all, it doesn't count if the Kafka nodes and Spark nodes are in the same cluster or not, but they should be able to connect to each other (open ports in firewall).

There are 2 ways to read from Kafka with Spark Streaming, using the older KafkaUtils.createStream() API, and the newer, KafkaUtils.createDirectStream() method.

I don't want to get into the differences between them, that is well documented here (in short, direct stream is better).

Addressing your question, how does the communication happen (internal anatomy): the best way to find out is looking at the Spark source code.

The createStream() API uses a set of Kafka consumers, directly from the official org.apache.kafka packages. These Kafka consumers have their own client called the NetworkClient, which you can check here. In short, the NetworkClient uses sockets for communicating.

The createDirectStream() API does use the Kafka SimpleConsumer from the same org.apache.kafka package. The SimpleConsumer class reads from Kafka with a java.nio.ReadableByteChannel which is a subclass of java.nio.SocketChannel, so in the end it is with done with sockets as well, but a bit more indirectly using Java's Non-blocking I/O convenience APIs.

So to answer your question: it is done with sockets.



来源:https://stackoverflow.com/questions/36027963/spark-streaming-how-spark-and-kafka-communication-happens

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!