stop-words

How to remove stop words using nltk or python

北慕城南 提交于 2019-11-25 20:48:24
So I have a dataset that I would like to remove stop words from using stopwords.words('english') I'm struggling how to use this within my code to just simply take out these words. I have a list of the words from this dataset already, the part i'm struggling with is comparing to this list and removing the stop words. Any help is appreciated. Daren Thomas from nltk.corpus import stopwords # ... filtered_words = [word for word in word_list if word not in stopwords.words('english')] You could also do a set diff, for example: list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk