DataTable.Select and Performance Issue in C#

后端 未结 6 2092
误落风尘
误落风尘 2020-12-15 01:56

I\'m importing the data from three Tab delimited files in the DataTables and after that I need to go thru every row of master table and find all the rows in two child tables

6条回答
  •  盖世英雄少女心
    2020-12-15 02:29

    .Net 4.5 and the issue is still there.

    Here are the results of a simple benchmark where DataTable.Select and different dictionary implementations are compared for CPU time (results are in milliseconds)

        #Rows Table.Select  Hashtable[] SortedList[] Dictionary[]
         1000        43,31         0,01         0,06         0,00
         6000       291,73         0,07         0,13         0,01
        11000       604,79         0,04         0,16         0,02
        16000       914,04         0,05         0,19         0,02
        21000      1279,67         0,05         0,19         0,02
        26000      1501,90         0,05         0,17         0,02
        31000      1738,31         0,07         0,20         0,03
    

    Problem:

    The DataTable.Select method creates a "System.Data.Select" class instance internally, and this "Select" class creates indexes based on the fields (columns) specified in the query. The Select class makes re-use of the indexes it had created but the DataTable implementation does not re-use the Select class instance hence the indexes are re-created every time DataTable.Select is invoked. (This behaviour can be observed by decompiling System.Data)

    Solution:

    Assume the following query

    DataRow[] rows = data.Select("COL1 = 'VAL1' AND (COL2 = 'VAL2' OR COL2 IS NULL)");
    

    Instead, create and fill a Dictionary with keys corresponding to the different value combinations of the values of the columns used as the filter. (This relatively expensive operation must be done only once and the dictionary instance must then be re-used)

    Dictionary> di = new Dictionary>();
    
    foreach (DataRow dr in data.Rows)
    {
        string key = (dr["COL1"] == DBNull.Value ? "" : dr["COL1"]) + "//" + (dr["COL2"] == DBNull.Value ? "" : dr["COL2"]);
        if (di.ContainsKey(key))
        {
            di[key].Add(dr);
        }
        else
        {
            di.Add(key, new List());
            di[key].Add(dr);
        }
    }
    

    Query the Dictionary (multiple queries may be required) to filter the rows and combine the results into a List

    string key1 = "VAL1//VAL2";
    string key2 = "VAL1//";
    List() results = new List();
    if (di.ContainsKey(key1))
    {
        results.AddRange(di[key1]);
    }
    if (di.ContainsKey(key2))
    {
        results.AddRange(di[key2]);
    }
    

提交回复
热议问题