I looked around hard but didn\'t find a satisfactory answer to this. Maybe I\'m missing something. Please help.
We have a Spark streaming application consuming a Kaf
The article below could be a good start to understand the approach.
spark-kafka-achieving-zero-data-loss
Further more,
The article suggests using zookeeper client directly, which can be replaced by something like KafkaSimpleConsumer also. The advantage of using Zookeper/KafkaSimpleConsumer is the monitoring tools that depend on Zookeper saved offset. Also the information can also be saved on HDFS or any other reliable service.