What's the asymptotic complexity of GroupBy operation?

后端未结

关注

 3  1342

I am interested in the asymptotic complexity (big O) of the GroupBy operation on unindexed datasets. What\'s the complexity of the best known algorithm and what\'s the compl

相关标签:

3条回答

我寻月下人不归

2020-12-19 07:43

Ignoring the base SQL that the group by is working on, when presented to the GROUP BY operation itself, the complexity is just O(n) since the data is scanned per-row and aggregated in one pass. It scales linearly to n (the size of the dataset).

When Group By is added to a complex query the equation changes, O(n) becomes the upper bound that the Group By adds to the overall equation; it could be less if the inner complex query is such that in the resolution of the base query, the data is already sorted.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-19 07:52

About Linq, I guess you want to know about the Linq-to-object group by complexity (Enumerable.GroupBy).

Checking the implementation with ILSpy, it appears to me it is O(n). (.Net Framework 4 series.)

It enumerates the source collection once. For each element, it computes its grouping key. Then it checks if it has already the key in a hashtable mapping to elements lists, adding the key to the hashtable if it is missing. Then it adds the element to the corresponding entry list in the hashtable.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-19 08:08

Grouping can be done in one pass (n complexity) on sorted rows (nlog(n) complexity) so complexity of group by is nlog(n) where n is number of rows. If there are indices for each column used in group by statement, the sorting is not necessary and the complexity is n.

0 讨论(0)
发布评论:

提交评论
- 加载中...