Best practices for multithreaded processing of database records

前端 未结 5 1072
小蘑菇
小蘑菇 2020-12-31 05:11

I have a single process that queries a table for records where PROCESS_IND = \'N\', does some processing, and then updates the PROCESS_IND to \'Y\'

5条回答
  •  梦毁少年i
    2020-12-31 06:07

    The pattern I'd use is as follows:

    • Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
    • Each task would do a query such as:

      UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10

    Where 10 is the "batch size".

    • Then each task does a SELECT to find out which rows it has "locked" for processing, and processes those
    • After each row is complete, you set lockedby and locktime back to NULL
    • All this is done in a loop for as many batches as exist.
    • A cron job or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up

    The LIMIT 10 is MySQL specific but other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.

提交回复
热议问题