How is the Unit of Work used with batch processing?

问题

When building a web application, it is a standard practice to use one Unit of Work per HTTP request, and flush all the changes and commit the transaction once after handling the request.

What approach is used when building a batch process? Is one Unit of Work instance used for the whole execution of the process, with transactions committed periodically at logical points?

This may be more of a discussion than a factual question, but I'm hoping to find out if there is a commonly-accepted "best practice" analogous to Session-per-Request.

回答1:

Unit Of Work is your business transaction. It is defined on scope of ISession. It should be shorter; but not too short. That is why, Session-Per-Request is recommended. With this, you can take advantage of various UoW features like tracking, first level cache, auto flushing etc. This may avoid some round trips to database and improve performance.

With very short ISession scope like Session-Per-Operation (No UoW at all), you miss all those benefits mentioned above.

With unnecessarily increasing ISession scope like Session-Per-Application OR grouping non related operations, you create many problems like invalid proxy state, increased memory usage etc.

Considering above, for batch processing, try to identify smaller UoW in your batch. If you can split batch in small UoW parts, go ahead with it. If you cannot split the batch, you have two ways to go:

Single ISession for entire batch:
If your batch processes same records over and over, then this may be useful. With delayed flushing, you will get some performance benefit.
Even if your batch processes each record only once, this may still benefit due to reduced flushes and saved round trips to database. Refer point 2 below.
New ISession for each operation in batch:
If your batch processes each record only once, then this may be better. I cannot say for sure as complete scenario is unknown.

Both have drawbacks mentioned above; better try to find out smaller UoW inside your batch.

For bulk read operations, IStatelessSession is better solution.

回答2:

The unit of work would be per request or shorter. (Lifetime scope) Longer-lived Contexts/Tx will lead to memory use and performance issues.

If it is a process that while a page is active the user is selecting records to be actioned on at a later stage in that session then I would consider tracking the selected IDs and applicable modifications client-side to be provided to the "act" step when triggered, or recording a simple batch record on the server session state or DB to associate the selected/modified entities. If stored in the DB there should be a date-time associated with the record and an automatic process to clean off any unprocessed batches that don't get finalized. (user abandons by closing browser for example.)

If it were a case of wanting to batch up records across many requests such as having web requests getting batched into groups of <=1000 or for a process to run every hour then I'd use a persistent data structure where the requests commit the data to the batch structure grouped by a batch run record which tracks the current state of the batch. 1. Check current batch status 2. If open, add/associate record to current batch. 3. If closed/processing, create new batch and associate record to new batch.

Interactions with the batch should be pessimistically locking so that the background process doesn't begin processing a batch while requests are being written.

The background batch process queries the batch to find the batch it should begin processing, update the status, process, and finalize the batch.

来源：https://stackoverflow.com/questions/50705354/how-is-the-unit-of-work-used-with-batch-processing

标签

entity-framework

nhibernate

orm

batch-processing

unit-of-work