Kafka replays messages over and over - Heartbeat session expired - marking coordinator dead

问题

Using python kafka api to read messages from a topic with only a handful of messages in it. Kafka keeps on replaying the messages in the queue over and over again.

It receives a message from my topic (comes back with each message content), then throws ERROR - Heartbeat session expired - marking coordinator dead and keeps on looping through rest of messages and keeps on replaying them. more logs:

kafka.coordinator - ERROR - Heartbeat session expired - marking coordinator dead
kafka.coordinator - WARNING - Marking the coordinator dead (node 1) for group GROUPID1: Heartbeat session expired.
kafka.coordinator.consumer - WARNING - Auto offset commit failed for group GROUPID1: CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
kafka.cluster - INFO - Group coordinator for GROUPID1 is BrokerMetadata(nodeId=1, host='HOST', port=PORT, rack=None)
kafka.coordinator - INFO - Discovered coordinator 1 for group GROUPID1
kafka.coordinator - INFO - Skipping heartbeat: no auto-assignment or waiting on rebalance
kafka.coordinator.consumer - ERROR - Offset commit failed: This is likely to cause duplicate message delivery
Traceback (most recent call last):
  File "/path/python3.5/site-packages/kafka/coordinator/consumer.py", line 407, in _maybe_auto_commit_offsets_sync
    self.commit_offsets_sync(self._subscription.all_consumed_offsets())
  File "/path/python3.5/site-packages/kafka/coordinator/consumer.py", line 398, in commit_offsets_sync
    raise future.exception # pylint: disable-msg=raising-bad-type
kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
kafka.coordinator.consumer - INFO - Revoking previously assigned partitions {TopicPartition(topic='TOPIC1', partition=0)} for group GROUPID1

回答1:

Seems like you need to tune your consumer configuration , most likely looking the logs it seems that the consumer's heartbeat session is expiring and it is unable to commit the last polled records due to expired session and it is triggering the rebalance so it will poll again from last uncommitted record

Config to check -

heartbeat.interval.ms
session.timeout.ms
max.poll.interval.ms

来源：https://stackoverflow.com/questions/46497352/kafka-replays-messages-over-and-over-heartbeat-session-expired-marking-coord

标签

python

apache-kafka

kafka-consumer-api