understanding check pointing in eventhub

后端 未结 1 1615
面向向阳花
面向向阳花 2020-12-13 15:30

I want to ensure that, if my eventhub client crashes (currently a console application), it only picks up events it has not yet taken from the eventhub. One way to achieve th

相关标签:
1条回答
  • 2020-12-13 16:25

    Lemme put forward a few basic terminology before answering:

    EventHubs is high-thruput durable event ingestion pipeline. Simply put - its a reliable stream of events on Cloud.

    Offset on EventData (one Event in the stream) is literally a Cursor on the Stream. Having this Cursor - will enable operations like - restart reading from this cursor (aka Offset) - inclusive or exclusive.

    EventProcessor library is a framework that EventHubs team built, on-Top-of ServiceBus SDK to make "eventhub receiver gu" - look easier. ZooKeeper for Kafka <-> EPH for Event Hub. It will make sure when the process running EventProcessor on a specific partition dies/crashes - it will be resumed from last Checkpointed offset - in other available EventProcessorHost instance.

    CheckPoint : as of today - EventHubs only supports client-side check-pointing. When you call Checkpoint from your Client-code:

    await context.CheckpointAsync();
    

    - it will translate to a Storage call (directly from Client) - which will store the current offset in the storage account you provided. EventHubs Service will not talk to Storage for Check-pointing.

    THE ANSWER

    EventProcessor framework is meant to achieve exactly what you are looking for.

    Checkpoints are not persisted via Server (aka EVENTHUBS Service). Its purely client-side. You are talking to Azure storage. That's the reason EventProcessor library brings in a new additional dependency - AzureStorageClient. You can connect to the storage account & the container to which the checkpoints are written to - we maintain the ownership information - EPH instances (name) to Partitions of EventHubs they own and at what checkpoint they currently read/processed until.

    As per the timer based checkpoint'ing pattern - you originally had - if the Process goes down - you will re-do the events in last 5 minute window. This is a healthy pattern as:

    1. fundamental assumption is that Faults are rare events - so you will deal with duplicate events rarely
    2. you will end-up make less calls to Storage service (which you could easily overwhelm by check-pointing frequently). I would go one step further and actually, would fire checkpoint call asynchronously. OnProcessEvents need not fail if checkpoint fails!

    if you want absolutely no-events to repeat - you will need to build this de-duplication logic in the down-stream pipeline.

    • every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no.

    here's more general reading on Event Hubs...

    0 讨论(0)
提交回复
热议问题