Why are lock hints needed on an atomic statement?

China☆狼群 提交于 2019-12-01 23:09:46

John is right in as these are optimizations, but in SQL world these optimizations can mean the difference between 'fast' vs. 'unbearable size-of-data slow' and/or the difference between 'works' vs. 'unusable deadlock mess'.

The readpast hint is clear. For the other two, I feel I need to add a bit more context:

  • ROWLOCK hint is to prevent page lock granularity scans. The lock granularity (row vs. page) is decided upfront when the query starts and is based on an estimate of the number pages that the query will scan (the third granularity, table, will only be used in special cases and does not apply here). Normally dequeue operations should never have to scan so many pages so that page granularity is considered by the engine. But I've seen 'in the wild' cases when the engine decided to use page lock granularity, and this leads to blocking and deadlocks in dequeue
  • UPDLOCK is needed to prevent the upgrade lock deadlock scenario. The UPDATE statement is logically split into a search for the rows that need to be updated and then update the rows. The search needs to lock the rows it evaluates. If the row qualifies (meets the WHERE condition) then the row is updated, and update is always an exclusive lock. So the question is how do you lock the rows during the search? If you use a shared lock then two UPDATE will look at the same row (they can, since the shared lock allows them), both decide the row qualifies and both try to upgrade the lock to exclusive -> deadlock. If you use exclusive locks during the search the deadlock cannot happen, but then UPDATE will conflict on all rows evaluated with any other read, even if the row does not qualifies (not to mention that Exclusive locks cannot be released early w/o breaking two-phase-locking). This is why there is an U mode lock, one that is compatible with Shared (so that UPDATE evaluation of candidate rows does not block reads) but is incompatible with another U (so that two UPDATEs do not deadlock). There are two reasons why the typical CTE based dequeue needs this hint:

    1. because is a CTE the query processing does not understand always that the SELECT inside the CTE is the target of an UPDATE and should use U mode locks and
    2. the dequeue operation will always go after the same rows to update (the rows being 'dequeued') so deadlocks are frequent.

tl;dr

They're for performance optimisation in a high concurrency dedicated queue table scenario.

Verbose

I think I've found the answer by finding a related SO answer by this quoted blog's author.

It seems that this advice is for a very specific scenario; where the table being used as the queue is dedicated as a queue; i.e. the table is not used for any other purpose. In such a scenario the lock hints make sense. They have nothing to do with preventing a race condition; they're to improve performance in high concurrency scenarios by avoiding (very short term) blocking.

  • The ReadPast lock improves the performance in high concurrency scenarios; there's no waiting for the currently read record to be released; the only thing locking it will be another "Queue Worker" process, so we can safely skip knowing that that worker's dealing with this record.
  • The RowLock ensures that we don't lock more than one row at a time, so the next worker to request a message will get the next record rather than skipping several records because they're in a locked record's page.
  • The UpdLock is used to get a lock; i.e. RowLock says what to lock but doesn't say that there must be a lock, and ReadPast determines the behaviour when encountering other locked records, so again doesn't cause a lock on the current record. I suspect this is not explicitly needed as SQL would acquire it in the background anyway (in fact, in the linked SO answer only ReadPast is specified); but was included in the block post for completeness / to explicitly show the lock which SQL would be implicitly causing in the background anyway.

However that post is written for a dedicated queue table. Where the table is used for other things (e.g. in the original question it was a table holding invoice data, which happened to have a column used to track what had been printed), that advice may not be desirable. i.e. By using a ReadPast lock you're jumping over all locked records; and there's no guarantee that those records are locked by another worker processing your queue; they may be locked for some completely unrelated purpose. That will then break the FIFO requirement.

Given this, I think my answer on the linked question stands. i.e. Either create a dedicated table to handle the queue scenario, or consider the other options and their pros and cons in the context or your scenario.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!