Is there a way to make collections.Counter (Python2.7) aware that its input list is sorted?

后端未结

关注

 3  670

礼貌的吻别 2020-12-16 21:25

The Problem

I\'ve been playing around with different ways (in Python 2.7) to extract a list of (word, frequency) tuples from a corpus, or list of strings, and comp

3条回答

星月不相逢 (楼主)

2020-12-16 21:59
One source of inefficiency in the OP's code (which several answers fixed without commenting on) is the over-reliance on intermediate lists. There is no reason to create a temporary list of millions of words just to iterate over them, when a generator will do.

So instead of
```
cnt = Counter()
for word in [token.lower().strip(drop) for token in corpus]:
    cnt[word] += 1
```
it should be just
```
cnt = Counter(token.lower().strip(drop) for token in corpus)
```
And if you really want to sort the word counts alphabetically (what on earth for?), replace this
```
wordfreqs = sorted([(word, cnt[word]) for word in cnt])
```
with this:
```
wordfreqs = sorted(cnt.items())   # In Python 2: cnt.iteritems()
```
This should remove much of the inefficiency around the use of Counter (or any dictionary class used in a similar way).
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...