发表新帖

发表新帖

Given a file, find the ten most frequently occurring words as efficiently as possible

后端未结

关注

 15  1687

予麋鹿 2020-12-12 13:26

This is apparently an interview question (found it in a collection of interview questions), but even if it\'s not it\'s pretty cool.

We are told to do this efficien

15条回答

时光取名叫无心 (楼主)

2020-12-12 13:57
Not the most efficient CPU-wise, and UGLY, but it took only 2 minutes to bang out:

perl -lane '$h{$_}++ for @F; END{for $w (sort {$h{$b}<=>$h{$a}} keys %h) {print "$h{$w}\t$w"}}' file | head

Loop over each line with -n
Split each line into @F words with -a
Each $_ word increments hash %h
Once the END of file has been reached,
sort the hash by the frequency
Print the frequency $h{$w} and the word $w
Use bash head to stop at 10 lines

Using the text of this web page as input:
```
121     the
77      a
48      in
46      to
44      of
39      at
33      is
30      vote
29      and
25      you
```
I benchmarked this solution vs the top-rated shell solution (ben jackson) on a 3.3GB text file with 580,000,000 words.
Perl 5.22 completed in 171 seconds, while the shell solution completed in 474 seconds.
0 讨论(0)

查看其它15个回答
发布评论:

提交评论
- 加载中...

热议问题