How to lemmatize a list of sentences

后端 未结 2 670
孤独总比滥情好
孤独总比滥情好 2020-12-17 07:22

How can I lemmatize a list of sentences in Python?

from nltk.stem.wordnet import WordNetLemmatizer
a = [\'i like cars\', \'cats are the best\']
lmtzr = WordN         


        
相关标签:
2条回答
  • 2020-12-17 07:32

    You must lemmatize each word separately. Instead, you lemmatize sentences. Correct code fragment:

    from nltk.stem.wordnet import WordNetLemmatizer
    from nltk import word_tokenize
    sents = ['i like cars', 'cats are the best']
    lmtzr = WordNetLemmatizer()
    lemmatized = [[lmtzr.lemmatize(word) for word in word_tokenize(s)]
                  for s in sents]
    print(lemmatized)
    #[['i', 'like', 'car'], ['cat', 'are', 'the', 'best']]
    

    You can also get better results if you first do POS tagging and then provide the POS information to the lemmatizer.

    0 讨论(0)
  • 2020-12-17 07:45

    TL;DR:

    pip3 install -U pywsd
    

    Then:

    >>> from pywsd.utils import lemmatize_sentence
    
    >>> text = 'i like cars'
    >>> lemmatize_sentence(text)
    ['i', 'like', 'car']
    >>> lemmatize_sentence(text, keepWordPOS=True)
    (['i', 'like', 'cars'], ['i', 'like', 'car'], ['n', 'v', 'n'])
    
    >>> text = 'The cat likes cars'
    >>> lemmatize_sentence(text, keepWordPOS=True)
    (['The', 'cat', 'likes', 'cars'], ['the', 'cat', 'like', 'car'], [None, 'n', 'v', 'n'])
    
    >>> text = 'The lazy brown fox jumps, and the cat likes cars.'
    >>> lemmatize_sentence(text)
    ['the', 'lazy', 'brown', 'fox', 'jump', ',', 'and', 'the', 'cat', 'like', 'car', '.']
    

    Otherwise, take a look at how the function in pywsd:

    • Tokenize the string
    • Uses the POS tagger and maps to WordNet POS tagset
    • Attempts to stem
    • Finally calling the lemmatizer with the POS and/or stems

    See https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L129

    0 讨论(0)
提交回复
热议问题