Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

前端 未结 3 1453
醉酒成梦
醉酒成梦 2020-12-13 18:57

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence:

>>> text = nltk.word_tokenize(\"And now for something completel         


        
相关标签:
3条回答
  • 2020-12-13 19:37

    You can simply set the tagset attribute to 'universal' in the pos_tag method.

    In [39]: from nltk import word_tokenize, pos_tag
    ...: 
    ...: text = word_tokenize("Here is a simple way of doing this")
    ...: tags = pos_tag(text, tagset='universal')
    ...: print(tags)
    ...: 
    [('Here', 'ADV'), ('is', 'VERB'), ('a', 'DET'), ('simple', 'ADJ'), ('way', 'NOUN'), ('of', 'ADP'), ('doing', 'VERB'), ('this', 'DET')]
    
    0 讨论(0)
  • 2020-12-13 19:57

    Updated, in case anyone runs across the same problem. NLTK has since upgraded to a "universal" tagset, source here. Once you've tagged your text, use map_tag to simplify the tags.

    import nltk
    from nltk.tag import pos_tag, map_tag
    
    text = nltk.word_tokenize("And now for something completely different")
    posTagged = pos_tag(text)
    simplifiedTags = [(word, map_tag('en-ptb', 'universal', tag)) for word, tag in posTagged]
    print(simplifiedTags)
    # [('And', u'CONJ'), ('now', u'ADV'), ('for', u'ADP'), ('something', u'NOUN'), ('completely', u'ADV'), ('different', u'ADJ')]
    
    0 讨论(0)
  • 2020-12-13 20:00

    To simplify tags from the default tagger, you can use nltk.tag.simplify.simplify_wsj_tag, like so:

    >>> import nltk
    >>> from nltk.tag.simplify import simplify_wsj_tag
    >>> tagged_sent = nltk.pos_tag(tokens)
    >>> simplified = [(word, simplify_wsj_tag(tag)) for word, tag in tagged_sent]
    
    0 讨论(0)
提交回复
热议问题