I want to ensure that, if my eventhub client crashes (currently a console application), it only picks up events it has not yet taken from the eventhub. One way to achieve th
Lemme put forward a few basic terminology before answering:
EventHubs is high-thruput durable event ingestion pipeline. Simply put - its a reliable stream of events on Cloud.
Offset on EventData (one Event in the stream) is literally a Cursor on the Stream. Having this Cursor - will enable operations like - restart reading from this cursor (aka Offset) - inclusive or exclusive.
EventProcessor library is a framework that EventHubs team built, on-Top-of ServiceBus SDK to make "eventhub receiver gu" - look easier. ZooKeeper for Kafka <-> EPH for Event Hub. It will make sure when the process running EventProcessor on a specific partition dies/crashes - it will be resumed from last Checkpointed offset - in other available EventProcessorHost instance.
CheckPoint : as of today - EventHubs only supports client-side check-pointing. When you call Checkpoint from your Client-code:
await context.CheckpointAsync();
- it will translate to a Storage call (directly from Client) - which will store the current offset in the storage account you provided. EventHubs Service will not talk to Storage for Check-pointing.
THE ANSWER
EventProcessor framework is meant to achieve exactly what you are looking for.
Checkpoints are not persisted via Server (aka EVENTHUBS Service). Its purely client-side. You are talking to Azure storage. That's the reason EventProcessor library brings in a new additional dependency - AzureStorageClient. You can connect to the storage account & the container to which the checkpoints are written to - we maintain the ownership information - EPH instances (name) to Partitions of EventHubs they own and at what checkpoint they currently read/processed until.
As per the timer based checkpoint'ing pattern - you originally had - if the Process goes down - you will re-do the events in last 5 minute window. This is a healthy pattern as:
if you want absolutely no-events to repeat - you will need to build this de-duplication logic in the down-stream pipeline.
here's more general reading on Event Hubs...