问题
I am working on MS SQL server.
I have a table called "User" with three columns and default index which is created with Primary Key of the table, of UserId.
I have a word file that contains user information line by line. There are almost 10000 lines.
I have a program that reads the user information from the word file and inserts it into the database. It is written with C# in visual studio. The program uses repository and unitofwork pattern.
The program workflow is as follows: 1) read a single line of user information from the word file. 2) create an user object based on the information 3) write the object to the repository 4) commit the work, in which the database insert statement executes.
Basically, the program executes "insert statement" each time it reads user information from the word file.
Here is my question.
I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?
Please enlighten me what happens after and before the insert statement in SQL database.
Thank you Guys.
回答1:
Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?
No.In fact USERID autoincreament
as clustered index
is ideal choice for CI.
Since PK candidate is Auto Increament
.Data will be always appended in last page.
However incase of Udate statement page split can happen provided address is of large length than earlier.
If possible make Address to varchar
and as narrow as possible.
Main problem is very frequent insert, very frequent database hit.
If there are 1000 records to be inserted then create UDT
and insert create 50/100
at a time.You can so by applying Paging logic in insert method.It is easy and will be helpful.
Optimise your UI layer code like use Connection Pooling
, keep relevant data type and pass length of variable in DAL(Sql parameter)
.
I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?
No since userid is ever increasing. No sorting work happen. May be there is fault in `insert sql script'. Main culprit is very frequent database hit.
Please enlighten me what happens after and before the insert statement in SQL database.
Please enlighten me what happens after and before the insert statement in SQL database.
Whenever data is inserted,insert take placein two places. At table level in Data pages ,and in index level.
Clustered index stores the actual data rows of the table at the leaf level of the index, in addition to controlling the sorting criteria of the data within the data pages and the order of the pages itself, based on the clustered index key.
Index Page split will happen. How ? Suppose There are 3 intermerdiate level and 4 leaf level. For example now if you insert 1 record,2 records nothing will happen.Insertion process at the stage will be fast.
Suppose you insert few more record (say after 10,20) then Intermediate lebel and leaf level will both increase.Becasue index page has space limit so when it will no
longer be able to accommodate new record then it will split page to accomadate new records.Becasue of this reason column length should be as narrarow as possible.
But in your case clustered index do not have to do Sorting criteriA.So one less work perform by Clusetered index.
Also Index page split cost will be less than non auto increament key or wide key.
since you are very frequently inserting records,it will impact your performance every now and then.
In case of Bulk insert Index page split will be less, so performance will improve.
In HEAP Table,since there is no clustered Index to maintain it has one less task to do.So very frequent Insert might improve.
But you have to decide Insert performance vs Select performance.
If this table is very frequntly use to fetch record then you hv to keep clustered index. If it very seldom use or record is less than 100 HEAP table is ok.
further reading,
index structure and concepts
Index Architecture and Design Guide
Heaps (Tables without Clustered Indexes)
回答2:
If your Word document includes the UserId (PRIMARY KEY) which then gets inserted into the table, I can see why this would be EXTRA slow.
Understanding CLUSTERED vs NON-Clustered Indexing.
In a CLUSTERED INDEX the physical rows of each table are re-arranged to according to the index. To use an every-day analogy, it is like arranging books (records) in a bookshelf in alphabetical order. Every time a new book comes in, you have to physically rearrange the other books so that the alphabetical index is properly maintained. Obviously this is VERY SLOW for inserts but really fast for SELECTS.
A Non-Clustered Index on the other hand does not alter the physical rows in a table when a new record comes in. Using the bookshelf as an analogy, if you wanted to look up books by author, you would keep a piece of paper on the side as an "index card" to look up the location of the book in the shelf that matches a specific author.
My solution to your problem if you're going to insert a large number of records at once is this:
- DROP THE INDEX
- INSERT THE DATA IN BULK
- RE-CREATE THE INDEX
来源:https://stackoverflow.com/questions/51761627/what-happens-after-inserting-data-to-a-table-with-default-clustered-index-in-sql