Can I run Kafka Streams Application on the same machine as of Kafka Broker?

℡╲_俬逩灬. 提交于 2019-12-06 13:22:23

It is technically possible to run your Kafka Streams application on the same servers as your broker. But it is not recommended. Both would need to share the same resources and you would end up with a contention.

Whenever I take any kafka broker down, it goes into rebalancing

Not sure why this is happening. What version of Kafka or Streams API are you using? If you are on broker 0.10.1+ I would highly recommend to upgrade your Streams application to 0.11 (note, you can do this without broker upgrade).

Depending on the details of the issue you are phasing, StandbyTask might help with long rebalance times. You can simple configure parameter num.standby.replica = 1 to enable StandbyTasks.

Answering the question in the title:

Coming from a Spark/HDFS background, I think this is a change of thinking, since you are used to think that it is good to have your processing where your data is, to take advantage of data locality. Here, the broker will provide the data locality but will have to send the data to Kafka Streams cluster for processing (losing some of its benefits). However, keeping them separate allows you to manage both clusters separately.

If you think of a cluster that runs high-latency processing jobs, that shares data + processing (e.g. an HDFS + YARN cluster), you can get "the process where data is" and not the opposite. You can allocate resources for your data processing - but the idea is that your processing does not depend on temporary data spikes (as it does with Streaming) but on the total data volumes. If your data grows, your calculations will take more, and you can allocate more resources, but they will grow at the same time. However, on a streaming application, necessary processing power does depend on data spikes (and your low-latency requirements) and not on total data volumes, so it makes sense that storage and processing are dimensioned and managed separately, since their elasticity demands are not based on the same dimension.

This comes apart from the obvious fact that having both data handling - Kafka broker - and data processing - Kafka Streams in the same node puts more load into a node, but we are assuming here this has been taken into account when dimensioning your nodes.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!