Build an ASCII chart of the most commonly used words in a given text [closed]

前端未结

关注

 30  1630

野的像风 2020-11-30 16:22

30条回答

北海茫月 (楼主)

2020-11-30 16:54
Haskell - ~~366~~ ~~351~~ ~~344~~ ~~337~~ 333 characters

(One line break in main added for readability, and no line break needed at end of last line.)
```
import Data.List
import Data.Char
l=length
t=filter
m=map
f c|isAlpha c=toLower c|0<1=' '
h w=(-l w,head w)
x!(q,w)='|':replicate(minimum$m(q?)x)'_'++"| "++w
q?(g,w)=q*(77-l w)`div`g
b x=m(x!)x
a(l:r)=(' ':t(=='_')l):l:r
main=interact$unlines.a.b.take 22.sort.m h.group.sort
  .t(`notElem`words"the and of to a i it in or is").words.m f
```
How it works is best seen by reading the argument to interact backwards:
- map f lowercases alphabetics, replaces everything else with spaces.
- words produces a list of words, dropping the separating whitespace.
- filter (notElemwords "the and of to a i it in or is") discards all entries with forbidden words.
- group . sort sorts the words, and groups identical ones into lists.
- map h maps each list of identical words to a tuple of the form (-frequency, word).
- take 22 . sort sorts the tuples by descending frequency (the first tuple entry), and keeps only the first 22 tuples.
- b maps tuples to bars (see below).
- a prepends the first line of underscores, to complete the topmost bar.
- unlines joins all these lines together with newlines.
The tricky bit is getting the bar length right. I assumed that only underscores counted towards the length of the bar, so || would be a bar of zero length. The function b maps c x over x, where x is the list of histograms. The entire list is passed to c, so that each invocation of c can compute the scale factor for itself by calling u. In this way, I avoid using floating-point math or rationals, whose conversion functions and imports would eat many characters.

Note the trick of using -frequency. This removes the need to reverse the sort since sorting (ascending) -frequency will places the words with the largest frequency first. Later, in the function u, two -frequency values are multiplied, which will cancel the negation out.
0 讨论(0)

查看其它30个回答
发布评论:

提交评论
- 加载中...

热议问题