stop-words

How to remove stop words using nltk or python

阅读更多关于 How to remove stop words using nltk or python

So I have a dataset that I would like to remove stop words from using stopwords.words('english') I'm struggling how to use this within my code to just simply take out these words. I have a list of the words from this dataset already, the part i'm struggling with is comparing to this list and removing the stop words. Any help is appreciated. Daren Thomas from nltk.corpus import stopwords # ... filtered_words = [word for word in word_list if word not in stopwords.words('english')] You could also do a set diff, for example: list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk

订阅 stop-words