import from text file to SQL Server Database, is ADO.NET too slow?

六月ゝ 毕业季﹏ 提交于 2020-01-02 11:02:21

问题


My program is now still running to import data from a log file into a remote SQL Server Database. The log file is about 80MB in size and contains about 470000 lines, with about 25000 lines of data. My program can import only 300 rows/second, which is really bad. :(

public static int ImportData(string strPath)
{
    //NameValueCollection collection = ConfigurationManager.AppSettings;

    using (TextReader sr = new StreamReader(strPath))
    {
        sr.ReadLine(); //ignore three first lines of log file
        sr.ReadLine();
        sr.ReadLine();
        string strLine;
        var cn = new SqlConnection(ConnectionString);
        cn.Open();

        while ((strLine = sr.ReadLine()) != null)
        {
            {
                if (strLine.Trim() != "") //if not a blank line, then import into database
                {
                    InsertData(strLine, cn);
                    _count++;
                }
            }
        }
        cn.Close();
        sr.Close();

        return _count;
    }
}

InsertData is just a normal insert method using ADO.NET. It uses a parsing method:

public Data(string strLine)
{
    string[] list = strLine.Split(new[] {'\t'});
    try
    {
        Senttime = DateTime.Parse(list[0] + " " + list[1]);
    }
    catch (Exception)
    {
    }

    Clientip = list[2];
    Clienthostname = list[3];

    Partnername = list[4];
    Serverhostname = list[5];
    Serverip = list[6];

    Recipientaddress = list[7];
    Eventid = Convert.ToInt16(list[8]);
    Msgid = list[9];
    Priority = Convert.ToInt16(list[10]);
    Recipientreportstatus = Convert.ToByte(list[11]);
    Totalbytes = Convert.ToInt32(list[12]);
    Numberrecipient = Convert.ToInt16(list[13]);
    DateTime temp;
    if (DateTime.TryParse(list[14], out temp))
    {
        OriginationTime = temp;
    }
    else
    {
        OriginationTime = null;
    }
    Encryption = list[15];
    ServiceVersion = list[16];
    LinkedMsgid = list[17];
    MessageSubject = list[18];
    SenderAddress = list[19];
}

InsertData method:

private static void InsertData(string strLine, SqlConnection cn)
{
    var dt = new Data(strLine); //parse the log line into proper fields 
    const string cnnStr =
        "INSERT INTO LOGDATA ([SentTime]," + "[client-ip]," +
        "[Client-hostname]," + "[Partner-Name]," + "[Server-hostname]," +
        "[server-IP]," + "[Recipient-Address]," + "[Event-ID]," + "[MSGID]," +
        "[Priority]," + "[Recipient-Report-Status]," + "[total-bytes]," +
        "[Number-Recipients]," + "[Origination-Time]," + "[Encryption]," +
        "[service-Version]," + "[Linked-MSGID]," + "[Message-Subject]," +
        "[Sender-Address]) " + " VALUES (     " + "@Senttime," + "@Clientip," +
        "@Clienthostname," + "@Partnername," + "@Serverhostname," + "@Serverip," +
        "@Recipientaddress," + "@Eventid," + "@Msgid," + "@Priority," +
        "@Recipientreportstatus," + "@Totalbytes," + "@Numberrecipient," +
        "@OriginationTime," + "@Encryption," + "@ServiceVersion," +
        "@LinkedMsgid," + "@MessageSubject," + "@SenderAddress)";


    var cmd = new SqlCommand(cnnStr, cn) {CommandType = CommandType.Text};

    cmd.Parameters.AddWithValue("@Senttime", dt.Senttime);
    cmd.Parameters.AddWithValue("@Clientip", dt.Clientip);
    cmd.Parameters.AddWithValue("@Clienthostname", dt.Clienthostname);
    cmd.Parameters.AddWithValue("@Partnername", dt.Partnername);
    cmd.Parameters.AddWithValue("@Serverhostname", dt.Serverhostname);
    cmd.Parameters.AddWithValue("@Serverip", dt.Serverip);
    cmd.Parameters.AddWithValue("@Recipientaddress", dt.Recipientaddress);
    cmd.Parameters.AddWithValue("@Eventid", dt.Eventid);
    cmd.Parameters.AddWithValue("@Msgid", dt.Msgid);
    cmd.Parameters.AddWithValue("@Priority", dt.Priority);
    cmd.Parameters.AddWithValue("@Recipientreportstatus", dt.Recipientreportstatus);
    cmd.Parameters.AddWithValue("@Totalbytes", dt.Totalbytes);
    cmd.Parameters.AddWithValue("@Numberrecipient", dt.Numberrecipient);
    if (dt.OriginationTime != null)
        cmd.Parameters.AddWithValue("@OriginationTime", dt.OriginationTime);
    else
        cmd.Parameters.AddWithValue("@OriginationTime", DBNull.Value);
            //if OriginationTime was null, then insert with null value to this column
    cmd.Parameters.AddWithValue("@Encryption", dt.Encryption);
    cmd.Parameters.AddWithValue("@ServiceVersion", dt.ServiceVersion);
    cmd.Parameters.AddWithValue("@LinkedMsgid", dt.LinkedMsgid);
    cmd.Parameters.AddWithValue("@MessageSubject", dt.MessageSubject);
    cmd.Parameters.AddWithValue("@SenderAddress", dt.SenderAddress);
    cmd.ExecuteNonQuery();
}

How can my program run faster? Thank you so much!


回答1:


Use SqlBulkCopy.

Edit: I created a minimal implementation of IDataReader and created a Batch type so that I could insert arbitrary in-memory data using SqlBulkCopy. Here is the important bit:

IDataReader dr = batch.GetDataReader();
using (SqlTransaction tx = _connection.BeginTransaction())
{
    try
    {
        using (SqlBulkCopy sqlBulkCopy =
            new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx))
        {
            sqlBulkCopy.DestinationTableName = TableName;
            SetColumnMappings(sqlBulkCopy.ColumnMappings);
            sqlBulkCopy.WriteToServer(dr);
            tx.Commit();
        }
    }
    catch
    {
        tx.Rollback();
        throw;
    }
}

The rest of the implementation is left as an exercise for the reader :)

Hint: the only bits of IDataReader you need to implement are Read, GetValue and FieldCount.




回答2:


Hmmm, let's break this down a little bit.

In pseudocode what you did is the ff:

  1. Open the file
    • Open a connection
    • For every line that has data:
    • Parse the string
    • Save the data in SQL Server
    • Close the connection
    • Close the file

Now the fundamental problems in doing it this way are:

  • You are keeping a SQL connection open while waiting for your line parsing (pretty susceptible to timeouts and stuff)
  • You might be saving the data line by line, each in its own transaction. We won't know until you show us what the InsertData method is doing
  • Consequently you are keeping the file open while waiting for SQL to finish inserting

The optimal way of doing this is to parse the file as a whole, and then insert them in bulk. You can do this with SqlBulkCopy (as suggested by Matt Howells), or with SQL Server Integration Services.

If you want to stick with ADO.NET, you can pool together your INSERT statements and then pass them off into one large SQLCommand, instead of doing it this way e.g., setting up one SQLCommand object per insert statement.




回答3:


You create the SqlCommand object for every row of data. The simplest improvement would therefore to create a

private static SqlCommand cmdInsert

and declare the parameters with the Parameters.Add() method. Then for each data row, set the parameter values using

cmdInsert.Parameters["@paramXXX"].Value = valueXXX;

A second performance improvement might be to skip creation of Data objects for each row, and assign Parameter values directly from the list[] array.



来源:https://stackoverflow.com/questions/1152414/import-from-text-file-to-sql-server-database-is-ado-net-too-slow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!