LINQ Performance for Large Collections

前端 未结 6 1789
梦毁少年i
梦毁少年i 2021-02-02 11:48

I have a large collection of strings (up to 1M) alphabetically sorted. I have experimented with LINQ queries against this collection using HashSet, SortedDictionary, and Dictio

6条回答
  •  误落风尘
    2021-02-02 12:00

    I bet you have an index on the column so SQL server can do the comparison in O(log(n)) operations rather than O(n). To imitate the SQL server behavior, use a sorted collection and find all strings s such that s >= query and then look at values until you find a value that does not start with s and then do an additional filter on the values. This is what is called a range scan (Oracle) or an index seek (SQL server).

    This is some example code which is very likely to go into infinite loops or have one-off errors because I didn't test it, but you should get the idea.

    // Note, list must be sorted before being passed to this function
    IEnumerable FindStringsThatStartWith(List list, string query) {
        int low = 0, high = list.Count - 1;
        while (high > low) {
            int mid = (low + high) / 2;
            if (list[mid] < query)
                low = mid + 1;
            else
                high = mid - 1;
        }
    
        while (low < list.Count && list[low].StartsWith(query) && list[low].Length > query.Length)
            yield return list[low];
            low++;
        }
    }
    

提交回复
热议问题