What is your recommended way to import .csv files into Microsoft SQL Server 2008 R2?
I\'d like something fast, as I have a directory with a lot of .csv files (>500MB
If you have to do anything with the data in the files other than insert it, then I would recommend using SSIS. It can not only insert and/or update, it can also clean the data for you.
I should recommend this:
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
namespace ReadDataFromCSVFile
{
static class Program
{
static void Main()
{
string csv_file_path=@"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
}
}
//Copy the DataTable to SQL Server using SqlBulkCopy
function static void InsertDataIntoSQLServerUsingSQLBulkCopy(DataTable csvData)
{
using(SqlConnection dbConnection = new SqlConnection("Data Source=ProductHost;Initial Catalog=yourDB;Integrated Security=SSPI;"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = "Your table name";
foreach (var column in csvFileData.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(csvFileData);
}
}
}
Using BULK INSERT in a T-SQL script seems to be a good solution.
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
You can get the list of files in your directory with xp_cmdshell and the dir command (with a bit of cleanup). In the past, I tried to do something like this with sp_OAMethod and VBScript functions and had to use the dir method because I had trouble getting the list of files with the FSO object.
http://www.sqlusa.com/bestpractices2008/list-files-in-directory/
If the structure of all your CSVs are the same i recomend you to use Integration Services (SSIS) in order to loop between them and insert all of them into the same table.
I understand this is not exactly your question. But, if you get into a situation where you use a straight insert use tablock and insert multiple rows. Depends on the row size but I usually go for 600-800 rows at at time. If it is a load into an empty table then sometimes dropping the indexes and creating them after it is loaded is faster. If you can sort the data on the clustered index before it is loaded. Use IGNORE_CONSTRAINTS and IGNORE_TRIGGERS if you can. Put the database in single user mode if you can.
USE AdventureWorks2008R2; GO INSERT INTO Production.UnitMeasure with (tablock) VALUES (N'FT2', N'Square Feet ', '20080923'), (N'Y', N'Yards', '20080923'), (N'Y3', N'Cubic Yards', '20080923'); GO
First officially supported way of importing large text files is with command line tool called "bcp" (Bulk Copy Utility), very useful for huge amounts of binary data.
Please check out this link: http://msdn.microsoft.com/en-us/library/ms162802.aspx
However, in SQL Server 2008 I presume that BULK INSERT command would be your choice number one, because on the first place it became a part of standard command set. If for any reason you have to maintain vertical compatibility, I'd stick to bcp utility, available for SQL Server 2000 too.
HTH :)
EDITED LATER: Googling around I recalled that SQL Server 2000 had BULK INSERT command too... however, there was obviously some reason I sticked up to bcp.exe, and I cannot recall why... perhaps of some limits, I guess.