Libraries for ADO.NET to rapidly bulk insert data into a database from a .csv file?

狂风中的少年 提交于 2019-11-27 20:17:23

The ADO.net SqlBulkCopy class enables quick, mass upload of records into a table:

    DataTable dt = s_EmptyUploadTable.Copy();
    foreach (var itm in yourList) {
        DataRow row = dt.NewRow();
        row["Field1"] = itm.Field1;
        row["Field2"] = itm.Field2;
        dt.Rows.Add(row);
    }
    using (SqlConnection cn = new SqlConnection(yourConnectionString)) {
        cn.Open();
        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(cn)) {
            bulkCopy.DestinationTableName = "dbo.YourActualSQLServerTableName";
            bulkCopy.WriteToServer(dt);
        }
        cn.Close();
    }

You can use LINQ Entity Data Reader to write an IEnumerable list to a database using SQL Bulk Copy behind the scenes. You can use this library to bulk upload the results of a LINQ query straight into the database, because the results of a LINQ query are IEnumerable.

As there are LINQ-to-everything adapters, you can do tricks like use the LINQ to CSV library to grab the data out of a .csv file using a LINQ query, then the LINQ Entity Data Reader to bulk write this data directly into the database.

Case study:

Problem: read a .csv file quickly into a database. The connection to the SQL database is via LINQ-to-Entitys from C#.

Solution 1: Use LINQ to CSV library, construct a LINQ query to pull out the data you want, then write it in using the standard LINQ-to-Entity calls (ctx.AddObject(), ctx.SaveChanges(), etc). Time taken: 30 seconds for 20,000 records, as LINQ ends up generating a query for every single record (slooooow!!!!!).

Solution 2: Use LINQ to CSV library, construct a LINQ query to pull out the data you want into an IEnumerable, use LINQ Entity Data Reader to bulk write this data directly into the target data table. Time taken: 3 seconds for 20,000 records.

Solution 3: Use a a stored procedure with SQL "bulk copy". Time taken: 2 seconds for 20,000 records. However, this solution is quite brittle as it relies on a stored procedure, and SQL bulk copy is just not compatible with some .csv file formats. This method also requires that you use a staging table between the actual target table and the .csv file, to deal with file formatting issues and to help with normalization.

And, here is the source code for solution #2:

static void WriteCSVtoSQLtable()
{
  // Step 1: Read .csv file into IEnumerable using LINQ-to-CSV class.

  // This section requires "LINQtoCSV" class as described at http://www.codeproject.com/KB/linq/LINQtoCSV.asp

  string inputFilePath = @"T:\x.csv";

  CsvFileDescription inputFileDescription = new CsvFileDescription
  {
    SeparatorChar = ',',
    FirstLineHasColumnNames = true
  };

  IEnumerable<MyCustomColumnMappingClass> csvChains = cc.Read<MyCustomColumnMappingClass>(inputFilePath, inputFileDescription);

  // Step 2: Now write into the target table on SQL Server.

  // This section requires "EntityDataReader" class described at http://archive.msdn.microsoft.com/LinqEntityDataReader.
  public static string dbSqlConnectionString = @";Data Source=(local);Initial Catalog=PhiEngine;Integrated Security=True;MultipleActiveResultSets=True";

  SqlConnection dbSql(dbSqlConnectionString);

  using (var tran = dbSql.BeginTransaction())
  {

    var csvFile = from p in csvChains
                select p;

    SqlBulkCopy bc = new SqlBulkCopy(dbSql,
      SqlBulkCopyOptions.CheckConstraints |
      SqlBulkCopyOptions.FireTriggers |
      SqlBulkCopyOptions.KeepNulls, tran)
                     {
                       BatchSize = 1000,
                       DestinationTableName = "TStagingTable" // Temporary staging table in database.
                     };

    bc.WriteToServer(csvFile.AsDataReader()); // Extension method .AsDataReader depends on adding the EntityDataReader class to your C# project (see above).

    tran.Commit();
  }
}

// This class is used by LINQ to CSV to query the .csv file, see "LINQtoCSV" website.
public class MyCustomColumnMappingClass
{
  [CsvColumn(Name = "symbol", FieldIndex = 1)]
  public string Symbol { get; set; }

  [CsvColumn(Name = "date", FieldIndex = 3, OutputFormat = @"MM/dd/yyyy")]
  public DateTime Date { get; set; }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!