Get rid of stopwords and punctuation

前端 未结 3 2090
粉色の甜心
粉色の甜心 2020-12-29 00:12

I\'m struggling with NLTK stopword.

Here\'s my bit of code.. Could someone tell me what\'s wrong?

from nltk.corpus import stopwords

def removeStopwo         


        
3条回答
  •  余生分开走
    2020-12-29 01:02

    Using a tokenizer first you compare a list of tokens (symbols) against the stoplist, so you don't need the re module. I added an extra argument in order to switch among languages.

    def remove_stopwords(sentence, language):
        return [ token for token in nltk.word_tokenize(sentence) if token.lower() not in stopwords.words(language) ]
    

    Dime si te fue de util ;)

提交回复
热议问题