DataTable.Select and Performance Issue in C#

后端 未结 6 2088
误落风尘
误落风尘 2020-12-15 01:56

I\'m importing the data from three Tab delimited files in the DataTables and after that I need to go thru every row of master table and find all the rows in two child tables

6条回答
  •  情话喂你
    2020-12-15 02:29

    I know this is an old question, and code underpinning this issue may have changed, but I've recently encountered (and gain some insight into) this very issue.

    For anyone coming along at a later date ... here's what I found.

    Performance of the DataTable.Select(condition) is quite sensitive to the nature and structure of the 'condition' you provide. This looks like a bug to me (where would I report it to Microsoft?) but it may merely be a quirk.

    I've written a set of tests to demonstrate the issue that are structured as follows:

    1. Define a datatable with a few simple columns,like this:

      var dataTable = new DataTable();
      var idCol = dataTable.Columns.Add("Id", typeof(Int32));
      dataTable.Columns.Add("Code", typeof(string));
      dataTable.Columns.Add("Name", typeof(string));
      dataTable.Columns.Add("FormationDate", typeof(DateTime));
      dataTable.Columns.Add("Income", typeof(Decimal));
      dataTable.Columns.Add("ChildCount", typeof(Int32));
      dataTable.Columns.Add("Foreign", typeof(Boolean));
      dataTable.PrimaryKey = new DataColumn[1] { idCol };

    2. Populate the table with 40000 records, each with a unique 'Code' field.

    3. Perform a batch of 'selects' (each with different parameters) against the datatable using two similar, but differently formatted, queries and record and compare the total time taken by each of the two formats.

    You get remarkable results. Testing, for example, the below two conditions side-by-side:

    Q1: [Code] = 'XX'

    Q2: ([Code] = 'XX')

    [ I do multiple Select calls using the above two queries, each iteration I replace the XX with a valid code that exists in the datatable ] The result?

    Time comparison for 320 lookups against 40000 records: 180 msec total search time with no brackets, 6871 msec total search time for search WITH brackets

    Yes - 38 times slower if you just have the extra brackets surrounding the condition. There are other scenarios which react differently.

    For example, [Code] = '{searchCode}' OR 1=0 vs ([Code] = '{searchCode}' OR 1=0) take similar (slow) times to execute, but:

    [Code] = '{searchCode}' AND 1=1 vs ([Code] = '{searchCode}' AND 1=1) again shows the non-bracketed version to be close to 40 times faster.

    I've not investigated all scenarios, but it seems that the introduction of brackets - either redundantly around a simple comparison check, or as required to specify sub-expression precedence - or the presence of an 'OR' slows the query down considerably.

    I could speculate that the issue is caused by how the datatable parses the condition you use and how it creates and uses internal indexes ... but I won't.

提交回复
热议问题