understanding check pointing in eventhub

后端未结

关注

 1  1615

I want to ensure that, if my eventhub client crashes (currently a console application), it only picks up events it has not yet taken from the eventhub. One way to achieve th

相关标签:

1条回答

太阳男子

2020-12-13 16:25
Lemme put forward a few basic terminology before answering:
EventHubs is high-thruput durable event ingestion pipeline. Simply put - its a reliable stream of events on Cloud.
Offset on EventData (one Event in the stream) is literally a Cursor on the Stream. Having this Cursor - will enable operations like - restart reading from this cursor (aka Offset) - inclusive or exclusive.
EventProcessor library is a framework that EventHubs team built, on-Top-of ServiceBus SDK to make "eventhub receiver gu" - look easier. ZooKeeper for Kafka <-> EPH for Event Hub. It will make sure when the process running EventProcessor on a specific partition dies/crashes - it will be resumed from last Checkpointed offset - in other available EventProcessorHost instance.
CheckPoint : as of today - EventHubs only supports client-side check-pointing. When you call Checkpoint from your Client-code:
```
await context.CheckpointAsync();
```
- it will translate to a Storage call (directly from Client) - which will store the current offset in the storage account you provided. EventHubs Service will not talk to Storage for Check-pointing.

THE ANSWER
EventProcessor framework is meant to achieve exactly what you are looking for.
Checkpoints are not persisted via Server (aka EVENTHUBS Service). Its purely client-side. You are talking to Azure storage. That's the reason EventProcessor library brings in a new additional dependency - AzureStorageClient. You can connect to the storage account & the container to which the checkpoints are written to - we maintain the ownership information - EPH instances (name) to Partitions of EventHubs they own and at what checkpoint they currently read/processed until.
As per the timer based checkpoint'ing pattern - you originally had - if the Process goes down - you will re-do the events in last 5 minute window. This is a healthy pattern as:
1. fundamental assumption is that Faults are rare events - so you will deal with duplicate events rarely
2. you will end-up make less calls to Storage service (which you could easily overwhelm by check-pointing frequently). I would go one step further and actually, would fire checkpoint call asynchronously. OnProcessEvents need not fail if checkpoint fails!
if you want absolutely no-events to repeat - you will need to build this de-duplication logic in the down-stream pipeline.
- every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no.
here's more general reading on Event Hubs...
0 讨论(0)
发布评论:

提交评论
- 加载中...