Count verbs, nouns, and other parts of speech with python's NLTK

前端 未结 1 1974
我寻月下人不归
我寻月下人不归 2020-12-30 01:43

I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many t

相关标签:
1条回答
  • 2020-12-30 01:59

    The pos_tag method gives you back a list of (token, tag) pairs:

    tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')] 
    

    If you are using Python 2.7 or later, then you can do it simply with:

    >>> from collections import Counter
    >>> counts = Counter(tag for word,tag in tagged)
    >>> counts
    Counter({'DT': 2, 'NN': 2, 'VB': 1})
    

    To normalize the counts (giving you the proportion of each) do:

    >>> total = sum(counts.values())
    >>> dict((word, float(count)/total) for word,count in counts.items())
    {'DT': 0.4, 'VB': 0.2, 'NN': 0.4}
    

    Note that in older versions of Python, you'll have to implement Counter yourself:

    >>> from collections import defaultdict
    >>> counts = defaultdict(int)
    >>> for word, tag in tagged:
    ...  counts[tag] += 1
    
    >>> counts
    defaultdict(<type 'int'>, {'DT': 2, 'VB': 1, 'NN': 2})
    
    0 讨论(0)
提交回复
热议问题