Designing a component both producer and consumer in Kafka

有些话、适合烂在心里 提交于 2019-12-07 18:19:45

问题


I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data.

My problem is that one of my consumers (lets say ConsumerA) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component.

This is what bothers me, is this a desired architecture in Kafka?

Should I rather do all the processing in Samza and store only the digested (summary) information to the Kafka from Samza. But the amount of processing I am going to do is quite heavy, that is why I want to use a separate component for it (ComponentA). I guess my question can be generalized to all kind of data pipelines.

So is it a good practice for a component to be a consumer and a producer in a data pipeline?


回答1:


As long as Samza is writing to different topics than it is consuming from, no, there will be no problem. Samza jobs that read from and write to Kafka are the norm and intended by the architecture. One can also have Samza jobs that bring some data in from another system, or jobs that write some data from Kafka out to a different system (or even jobs that don't use Kafka at all).

Having a job read from and write to the same topic, is, however, where you'd get a loop and to be avoided. This has the potential to fill up your Kafka brokers' disks really fast.



来源:https://stackoverflow.com/questions/29823592/designing-a-component-both-producer-and-consumer-in-kafka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!