Sunday, August 11, 2013

Bulk Insert into MS SQL Compact Edition Database from CSV files using C#

In MS SQL Server, bulk insert into database from CSV files can be easily done by calling 'LOAD' command from SQL query. However, this command is not supported in MS SQL compact edition. By inserting line by line into sqlce database is obviously too slow for most application that consumes large amount of data. Below is a source code written in C# for sqlce which allows users to obtain performance similar to bulk insert in MS SQL Server

public void BulkInsertFromCSV(string file_path, string table_name, string connection_string, Dictionary<string, Type> data_types)
{
 string line;
 List<string> column_names = new List<string>();
 using (StreamReader reader = new StreamReader(file_path))
 {
  line = reader.ReadLine();
  string[] texts = line.Split('\t');
  foreach (string txt in texts)
  {
   column_names.Add(txt);
  }

   using (SqlCeConnection conn = new SqlCeConnection(connection_string))
   {
    SqlCeCommand cmd = new SqlCeCommand();
    SqlCeResultSet rs;
    SqlCeUpdatableRecord rec;
    conn.Open();
    cmd.Connection = conn;
    cmd.CommandText = table_name;
    cmd.CommandType = CommandType.TableDirect;

    rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable);
    

    while ((line = reader.ReadLine()) != null)
    {
     texts = line.Split('\t');

     rec = rs.CreateRecord();

     for (int j = 0; j < column_names.Count; ++j)
     {
      string columnName = column_names[j];
      
      int ordinal = rec.GetOrdinal(columnName);
      string param_value = "";
      if (j < texts.Length)
      {
       param_value = texts[j];
      }

      Type data_type=data_types[columnName];
      if (data_type == typeof(Int32))
      {
       Int32 value=0;
       int.TryParse(param_value, out value);
       rec.SetInt32(ordinal, value);
      }
      else if (data_type == typeof(Int64))
      {
       Int64 value = 0;
       Int64.TryParse(param_value, out value);
       rec.SetInt64(ordinal, value);
      }
      else if (data_type == typeof(Int16))
      {
       Int16 value = 0;
       Int16.TryParse(param_value, out value);
       rec.SetInt16(ordinal, value);
      }
      else if (data_type == typeof(string))
      {
       rec.SetString(ordinal, param_value);
      }
      else if (data_type == typeof(double))
      {
       double value=0;
       double.TryParse(param_value, out value);
       rec.SetDouble(ordinal, value);
      }
      else if (data_type == typeof(float))
      {
       float value = 0;
       float.TryParse(param_value, out value);
       rec.SetFloat(ordinal, value);
      }
      else if (data_type == typeof(DateTime))
      {
       DateTime value;
       if(DateTime.TryParse(param_value, out value))
           rec.SetDateTime(ordinal, value);
      }
      else if (data_type == typeof(decimal))
      {
         decimal value;
         decimal.TryParse(param_value, out value);
         rec.SetDecimal(ordinal, value);
      }
     }

     rs.Insert(rec);
    }

    rs.Close();
    rs.Dispose();
    cmd.Dispose();
   
  }
 }
}

As can seen in the code above, the parameters for the method include:

file_path: full path to the CSV file, which is assumed to be tab-separated and has a header column which has the same names as the database column names
table_name: name of the database table into which the CSV file data will be populated
connection_string: the database connection string
data_types: a data type holder of Dictionary<string, Type>, for which the key specifies the name of the datatable column while the type specifies the type of data held by the column

The usage of the method is very simple, support the datatable schema looks like the following:
CREATE TABLE [product](
 [PRODUCTID] [int] NULL,
 [PRODUCTNAME] [nvarchar](50) NULL,
 [PRODUCTGROUPCODE] [nvarchar](50) NULL,
 [PRODUCTGROUPNAME] [nvarchar](50) NULL,
 [INSTOCKFLAG] [nvarchar](50) NULL,
 [FULLPRICE] [money] NULL
) 

And the CSV file C:\temp.csv looks like the following
PRODUCTID PRODUCTNAME PRODUCTGROUPCODE PRODUCTGROUPNAME INSTOCKFLAG FULLPRICE
10001  CA CALENDAR N 15
10002  CA CALENDAR N 10
10003  CA CALENDAR N 10
10004  BK BOOK Y 90
10005  BK BOOK Y 16
10006  BK BOOK N 10
10007  BK BOOK N 7
10008  BK BOOK N 11

The invoking of the method will be something like:
string connection_string="[YOUR DB CONNECTION STRING]";
Dictionary<string, Type> data_types=new Dictionary<string, Type>();
data_types["PRODUCTID"]=typeof(Int32);
data_types["PRODUCTNAME"]=typeof(string);
data_types["PRODUCTGROUPCODE"]=typeof(string);
data_types["INSTOCKFLAG"]=typeof(string);
data_types["FULLPRICE"]=typeof(float);
BulkInsertFromCSV(@"C:\temp.csv", "product", connection_string, data_types);

One more thing, i found that there is a bug that prevents insertion of float value to the datatable. In my implementation, the .net complains that "cannot insert Single to column (xxx). The column type is Float". Although i already specify using "SetFloat" but the errors simply do not go away, therefore, currently I am using "SetValue(ordinal, value)" in place of "SetFloat(ordinal, value)" which kind of solving my problem.

No comments:

Post a Comment