What happens during the insertion, deletion and update in sql?

被刻印的时光 ゝ 提交于 2019-12-24 12:29:51

问题


I would like to know a few things regarding mysql architecture. 1. How sql process insert, delete, update operations in an indexed table? 2. It is said that changes are only made in the change buffer when the index page is not in the buffer pool. So if changes are made after the buffer pool loads the concerned index page, then it has to alter the same page in disk as well. right? So an operation has to be done in three different places? 3. How NULL values are indexed? where would they be stored in a b+tree? 4. If we update a data which is the clustered index, then when will it be updated in the disk? 5. What happens during bulk loading?


回答1:


How to process insert/update/delete...

  1. Fetch (and cache) index block(s) needed for locating the row(s) to be updated/deleted, or the blocks where new row(s) will be inserted.
  2. Fetch the data block(s). Note that all indexes include the PRIMARY KEY, which is clustered with the data.
  3. Modify the data block(s) to reflect the changes. Also deal with remembering the old data -- in case of an eventual ROLLBACK.
  4. Update unique index blocks (that includes the PK).
  5. Store non-unique index changes in the change buffer. (As you noted.)

The change buffer is designed to be a 'transparent' to the actual index blocks.

  • A lookup by an index will always 'do the right thing', whether the entry is in the CB or not.
  • Folding of CB entries back into actual index blocks is done in the 'background' and/or when running out of room. (The CB defaults to 1/4 of the buffer_pool, I think.)
  • Sufficient information is stored in the transaction log, such that a crash will not the loss of pending index updates.
  • Clearly the CB was invented for performance. An index update can be delayed, and meanwhile, takes a lot less space (often only a few dozen bytes) than the index block (16KB) that needs updating. Multiple changes (usually) can be applied to a single index block -- This is the main savings. But note, because of randomness, UUIDs, MD5, etc, cannot make good use of the CB. A non-unique index on the current datetime/timestamp is a case where the CB's buffering really shines.

(Sorry, my knowledge of the CB is a bit vague for the level at which you are asking. I suggest you read the code.)

NULL... I believe that is treated as a separate value that sorts before all non-null values in the B+Tree. But to confuse the issue, there is a flag determining whether nulls are treated as equal to each other. And there are restrictions on PRIMARY/UNIQUE keys.

Related to NULL... When doing PARTITION BY RANGE on some variant/function of DATE or DATETIME, invalid dates turn into NULL, which is explicitly stored in the 'first' partition. Newbies are often puzzled as to why partition pruning does not seem to work. (Recommended partial workaround: have a 'first' partition that is otherwise empty.)

Clustered and UNIQUE indexes... All(?) write operations must check all unique indexes, hence the CB is not involved with such. Note: In InnoDB, the PRIMARY KEY is always clustered and unique and cannot(?) have NULLs.

Bulk loading... I find that a 100-row INSERT will run 10 times as fast as 100 individual INSERTs. (This is due to parsing, etc.) But at the low level, a batch insert or LOAD DATA is just a bunch of individual inserts. So, the above discussion applies.

Bonus answers...

"IODKU" (INSERT ... ON DUPLICATE KEY UPDATE) is pretty much follows the 1..5 steps above. In locating the row to update, it discovers whether to update or insert, then proceeds accordingly.

REPLACE is really a shorthand for DELETE, plus UPDATE. But note this anomaly... If there are two unique keys on the table, a one-row REPLACE might delete 2 rows before inserting the 1 row.



来源:https://stackoverflow.com/questions/42367493/what-happens-during-the-insertion-deletion-and-update-in-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!