I am using Kafka 0.8.1 and Kafka python-0.9.0. In my setup, I have 2 kafka brokers setup. When I run my kafka consumer, I can see it retrieving messages from the queue and k
Kafka consumer is able to store offsets in Zookeeper. In Java API we have two options - high-level consumer, that manages state for us and starts consuming where it left after restart, and stateless low-level consumer without this superpower.
From what I understand in Python's consumer code (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer.py), both SimpleConsumer and MultiProcessConsumer are stateful and keep track of current offsets in Zookeeper, so it is strange that you have this reconsuming problem.
Make sure you have the same consumer group ids across restarts (may be you set it random?) and check the following options:
auto_commit: default True. Whether or not to auto commit the offsets auto_commit_every_n: default 100. How many messages to consume before a commit auto_commit_every_t: default 5000. How much time (in milliseconds) to wait before commit
May be you consume < 100 messages or < 5000 ms?