Streaming from particular partition within a topic (Kafka Streams)

徘徊边缘 提交于 2019-12-10 21:54:47

问题


As far as I understand after reading Kafka Streams documentation, it's not possible to use it for streaming data from only one partition from given topic, one always have to read it whole.

Is that correct?

If so, are there any plans to provide such an option to the API in the future?


回答1:


No you can't do that because the internal consumer subscribes to the topic joining a consumer group which is specified through the application-id so the partitions are assigned automatically. Btw why do you want do that ? Without re-balancing you lose the scalability feature provided by Kafka Stream because just adding/removing instances of your streaming application you can scale the entire process, thanks to the re-balancing on partitions.




回答2:


You can do something similar to your need using PartitionGrouper. A partition grouper can be used to create a stream task based on the given topic partition.

For example refer to the DefaultPartitionGrouper implementation. But it would require customization.

Therefore as @ppatierno suggested please look into your usecase and then design the topology in a way that you do not have to deviate from a standard practice.




回答3:


You can do this by specifying the topic,partition number and offset correctly

 Map(new TopicPartition(topic, partition) -> 2L)
    val stream = KafkaUtils.createDirectStream[String, String](
          ssc,
          PreferConsistent,
          Subscribe[String, String](topics, kafkaParams,offsets))

where partition refers to the Partition number,

2L refers to the starting offset of the partition

Refer streaming_from_specific_partiton for more details.




回答4:


You could not specify a partition in Kafka consumer because that is why Kafka scaling. Or we can say like this only a distributed system works. You can do segmentation and allocate each segment to a topic and each topic should have only one partition.

Since topics are registered in ZooKeeper , you might run into issues if trying to add too many of them, e.g. the case where you have a million users and have decided to create a topic per user.



来源:https://stackoverflow.com/questions/44657521/streaming-from-particular-partition-within-a-topic-kafka-streams

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!