Counting unique words in python

前端 未结 3 903
深忆病人
深忆病人 2020-12-19 10:15

In direct, my code so far is this :

from glob import glob
pattern = \"D:\\\\report\\\\shakeall\\\\*.txt\"
filelist = glob(pattern)
def countwords(fp):
    w         


        
相关标签:
3条回答
  • 2020-12-19 10:28
    print len(set(w.lower() for w in open('filename.dat').read().split()))
    

    Reads the entire file into memory, splits it into words using whitespace, converts each word to lower case, creates a (unique) set from the lowercase words, counts them and prints the output

    0 讨论(0)
  • 2020-12-19 10:28

    If you want to get count of each unique word, then use dicts:

    words = ['Hello', 'world', 'world']
    count = {}
    for word in words :
       if word in count :
          count[word] += 1
       else:
          count[word] = 1
    

    And you will get dict

    {'Hello': 1, 'world': 2}
    
    0 讨论(0)
  • 2020-12-19 10:37

    The best way to count objects in Python is to use collections.Counter class, which was created for that purposes. It acts like a Python dict but is a bit easier in use when counting. You can just pass a list of objects and it counts them for you automatically.

    >>> from collections import Counter
    >>> c = Counter(['hello', 'hello', 1])
    >>> print c
    Counter({'hello': 2, 1: 1})
    

    Also Counter has some useful methods like most_common, visit documentation to learn more.

    One method of Counter class that can also be very useful is update method. After you've instantiated Counter by passing a list of objects, you can do the same using update method and it will continue counting without dropping old counters for objects:

    >>> from collections import Counter
    >>> c = Counter(['hello', 'hello', 1])
    >>> print c
    Counter({'hello': 2, 1: 1})
    >>> c.update(['hello'])
    >>> print c
    Counter({'hello': 3, 1: 1})
    
    0 讨论(0)
提交回复
热议问题