Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

前端 未结 3 1461
醉酒成梦
醉酒成梦 2020-12-13 18:57

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence:

>>> text = nltk.word_tokenize(\"And now for something completel         


        
3条回答
  •  情深已故
    2020-12-13 19:57

    Updated, in case anyone runs across the same problem. NLTK has since upgraded to a "universal" tagset, source here. Once you've tagged your text, use map_tag to simplify the tags.

    import nltk
    from nltk.tag import pos_tag, map_tag
    
    text = nltk.word_tokenize("And now for something completely different")
    posTagged = pos_tag(text)
    simplifiedTags = [(word, map_tag('en-ptb', 'universal', tag)) for word, tag in posTagged]
    print(simplifiedTags)
    # [('And', u'CONJ'), ('now', u'ADV'), ('for', u'ADP'), ('something', u'NOUN'), ('completely', u'ADV'), ('different', u'ADJ')]
    

提交回复
热议问题