I am interested in the asymptotic complexity (big O) of the GroupBy operation on unindexed datasets. What\'s the complexity of the best known algorithm and what\'s the compl
Ignoring the base SQL that the group by is working on, when presented to the GROUP BY operation itself, the complexity is just O(n) since the data is scanned per-row and aggregated in one pass. It scales linearly to n (the size of the dataset).
When Group By is added to a complex query the equation changes, O(n) becomes the upper bound that the Group By adds to the overall equation; it could be less if the inner complex query is such that in the resolution of the base query, the data is already sorted.
About Linq, I guess you want to know about the Linq-to-object group by complexity (Enumerable.GroupBy).
Checking the implementation with ILSpy, it appears to me it is O(n). (.Net Framework 4 series.)
It enumerates the source collection once. For each element, it computes its grouping key. Then it checks if it has already the key in a hashtable mapping to elements lists, adding the key to the hashtable if it is missing. Then it adds the element to the corresponding entry list in the hashtable.
Grouping can be done in one pass (n complexity) on sorted rows (nlog(n) complexity) so complexity of group by is nlog(n) where n is number of rows. If there are indices for each column used in group by statement, the sorting is not necessary and the complexity is n.