What happens after inserting data to a table with default clustered index in SQL

雨燕双飞 提交于 2020-01-06 05:26:11

问题


I am working on MS SQL server.

I have a table called "User" with three columns and default index which is created with Primary Key of the table, of UserId.

I have a word file that contains user information line by line. There are almost 10000 lines.

I have a program that reads the user information from the word file and inserts it into the database. It is written with C# in visual studio. The program uses repository and unitofwork pattern.

The program workflow is as follows: 1) read a single line of user information from the word file. 2) create an user object based on the information 3) write the object to the repository 4) commit the work, in which the database insert statement executes.

Basically, the program executes "insert statement" each time it reads user information from the word file.

Here is my question.

I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

Please enlighten me what happens after and before the insert statement in SQL database.

Thank you Guys.


回答1:


Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

No.In fact USERID autoincreament as clustered index is ideal choice for CI. Since PK candidate is Auto Increament.Data will be always appended in last page.

However incase of Udate statement page split can happen provided address is of large length than earlier.

If possible make Address to varchar and as narrow as possible.

Main problem is very frequent insert, very frequent database hit. If there are 1000 records to be inserted then create UDT and insert create 50/100 at a time.You can so by applying Paging logic in insert method.It is easy and will be helpful.

Optimise your UI layer code like use Connection Pooling, keep relevant data type and pass length of variable in DAL(Sql parameter).

I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

No since userid is ever increasing. No sorting work happen. May be there is fault in `insert sql script'. Main culprit is very frequent database hit.

Please enlighten me what happens after and before the insert statement in SQL database.

Please enlighten me what happens after and before the insert statement in SQL database.

Whenever data is inserted,insert take placein two places. At table level in Data pages ,and in index level.

Clustered index stores the actual data rows of the table at the leaf level of the index, in addition to controlling the sorting criteria of the data within the data pages and the order of the pages itself, based on the clustered index key.

Index Page split will happen. How ? Suppose There are 3 intermerdiate level and 4 leaf level. For example now if you insert 1 record,2 records nothing will happen.Insertion process at the stage will be fast.

Suppose you insert few more record (say after 10,20) then Intermediate lebel and leaf level will both increase.Becasue index page has space limit so when it will no

longer be able to accommodate new record then it will split page to accomadate new records.Becasue of this reason column length should be as narrarow as possible.

But in your case clustered index do not have to do Sorting criteriA.So one less work perform by Clusetered index.

Also Index page split cost will be less than non auto increament key or wide key.

since you are very frequently inserting records,it will impact your performance every now and then.

In case of Bulk insert Index page split will be less, so performance will improve.

In HEAP Table,since there is no clustered Index to maintain it has one less task to do.So very frequent Insert might improve.

But you have to decide Insert performance vs Select performance.

If this table is very frequntly use to fetch record then you hv to keep clustered index. If it very seldom use or record is less than 100 HEAP table is ok.

further reading,

index structure and concepts

Index Architecture and Design Guide

Heaps (Tables without Clustered Indexes)




回答2:


If your Word document includes the UserId (PRIMARY KEY) which then gets inserted into the table, I can see why this would be EXTRA slow.

Understanding CLUSTERED vs NON-Clustered Indexing.

In a CLUSTERED INDEX the physical rows of each table are re-arranged to according to the index. To use an every-day analogy, it is like arranging books (records) in a bookshelf in alphabetical order. Every time a new book comes in, you have to physically rearrange the other books so that the alphabetical index is properly maintained. Obviously this is VERY SLOW for inserts but really fast for SELECTS.

A Non-Clustered Index on the other hand does not alter the physical rows in a table when a new record comes in. Using the bookshelf as an analogy, if you wanted to look up books by author, you would keep a piece of paper on the side as an "index card" to look up the location of the book in the shelf that matches a specific author.

My solution to your problem if you're going to insert a large number of records at once is this:

  1. DROP THE INDEX
  2. INSERT THE DATA IN BULK
  3. RE-CREATE THE INDEX


来源:https://stackoverflow.com/questions/51761627/what-happens-after-inserting-data-to-a-table-with-default-clustered-index-in-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!