Is there a way to make collections.Counter (Python2.7) aware that its input list is sorted?

后端 未结 3 670
礼貌的吻别
礼貌的吻别 2020-12-16 21:25

The Problem

I\'ve been playing around with different ways (in Python 2.7) to extract a list of (word, frequency) tuples from a corpus, or list of strings, and comp

3条回答
  •  星月不相逢
    2020-12-16 21:59

    One source of inefficiency in the OP's code (which several answers fixed without commenting on) is the over-reliance on intermediate lists. There is no reason to create a temporary list of millions of words just to iterate over them, when a generator will do.

    So instead of

    cnt = Counter()
    for word in [token.lower().strip(drop) for token in corpus]:
        cnt[word] += 1
    

    it should be just

    cnt = Counter(token.lower().strip(drop) for token in corpus)
    

    And if you really want to sort the word counts alphabetically (what on earth for?), replace this

    wordfreqs = sorted([(word, cnt[word]) for word in cnt])
    

    with this:

    wordfreqs = sorted(cnt.items())   # In Python 2: cnt.iteritems()
    

    This should remove much of the inefficiency around the use of Counter (or any dictionary class used in a similar way).

提交回复
热议问题