This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.
We are told to do this efficien
I think the trie data structure is a choice.
In the trie, you can record word count in each node representing frequency of word consisting of characters on the path from root to current node.
The time complexity to setup the trie is O(Ln) ~ O(n) (where L is number of characters in the longest word, which we can treat as a constant). To find the top 10 words, we can traversal the trie, which also costs O(n). So it takes O(n) to solve this problem.