What's the asymptotic complexity of GroupBy operation?

后端 未结 3 1342
长发绾君心
长发绾君心 2020-12-19 07:26

I am interested in the asymptotic complexity (big O) of the GroupBy operation on unindexed datasets. What\'s the complexity of the best known algorithm and what\'s the compl

相关标签:
3条回答
  • 2020-12-19 07:43

    Ignoring the base SQL that the group by is working on, when presented to the GROUP BY operation itself, the complexity is just O(n) since the data is scanned per-row and aggregated in one pass. It scales linearly to n (the size of the dataset).

    When Group By is added to a complex query the equation changes, O(n) becomes the upper bound that the Group By adds to the overall equation; it could be less if the inner complex query is such that in the resolution of the base query, the data is already sorted.

    0 讨论(0)
  • 2020-12-19 07:52

    About Linq, I guess you want to know about the Linq-to-object group by complexity (Enumerable.GroupBy).

    Checking the implementation with ILSpy, it appears to me it is O(n). (.Net Framework 4 series.)

    It enumerates the source collection once. For each element, it computes its grouping key. Then it checks if it has already the key in a hashtable mapping to elements lists, adding the key to the hashtable if it is missing. Then it adds the element to the corresponding entry list in the hashtable.

    0 讨论(0)
  • 2020-12-19 08:08

    Grouping can be done in one pass (n complexity) on sorted rows (nlog(n) complexity) so complexity of group by is nlog(n) where n is number of rows. If there are indices for each column used in group by statement, the sorting is not necessary and the complexity is n.

    0 讨论(0)
提交回复
热议问题