Slow bulk insert for table with many indexes

前端 未结 4 2155
野的像风
野的像风 2020-12-02 15:52

I try to insert millions of records into a table that has more than 20 indexes.

In the last run it took more than 4 hours per 100.000 rows, and the query was cancell

相关标签:
4条回答
  • 2020-12-02 16:20

    Disabling and then re-enabling indices is frequently suggested in those cases. I have my doubts about this approach though, because:

    (1) The application's DB user needs schema alteration privileges, which it normally should not possess. (2) The chosen insert approach and/or index schema might be less then optimal in the first place, otherwise rebuilding complete index trees should not be faster then some decent batch-inserting (e.g. the client issuing one insert statement at a time, causing thousands of server-roundtrips; or a poor choice on the clustered index, leading to constant index node splits).

    That's why my suggestions look a little bit different:

    • Increase ADO.NET BatchSize
    • Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column is a good choice
    • Let the client insert into a temporary heap table first (heap tables don't have any clustered index); then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table
    • Apply SqlBulkCopy
    • Decrease transaction logging by choosing bulk-logged recovery model

    You might find more detailled information in this article.

    0 讨论(0)
  • 2020-12-02 16:21

    This sounds like a data warehouse operation. It would be normal to drop the indexes before the insert and rebuild them afterwards.

    When you rebuild the indexes, build the clustered index first, and conversely drop it last. They should all have fillfactor 100%.

    Code should be something like this

    if object_id('Index') is not null drop table IndexList
    select name into Index from dbo.sysindexes where id = object_id('Fact')
    
    if exists (select name from Index where name = 'id1') drop index Fact.id1
    if exists (select name from Index where name = 'id2') drop index Fact.id2        
    if exists (select name from Index where name = 'id3') drop index Fact.id3
    .
    .
    BIG INSERT
    
    RECREATE THE INDEXES
    
    0 讨论(0)
  • 2020-12-02 16:24

    As noted by another answer disabling indexes will be a very good start.

    4 hours per 100.000 rows [...] The inserts are wrapped in a transaction per 100.000 rows.

    You should look at reducing the number, the server has to maintain a huge amount of state while in a transaction (so it can be rolled back), this (along with the indexes) means adding data is very hard work.

    Why not wrap each insert statement in its own transaction?

    Also look at the nature of the SQL you are using, are you adding one row per statement (and network roundtrip), or adding many?

    0 讨论(0)
  • 2020-12-02 16:27

    You can disable and enable the indexes. Note that disabling them can have unwanted side-effects (such as having duplicate primary keys or unique indices etc.) which will only be found when re-enabling the indexes.

    --Disable Index
    ALTER INDEX [IXYourIndex] ON YourTable DISABLE
    GO
    
    --Enable Index
    ALTER INDEX [IXYourIndex] ON YourTable REBUILD
    GO
    
    0 讨论(0)
提交回复
热议问题