How to remove stop words using nltk or python
So I have a dataset that I would like to remove stop words from using stopwords.words('english') I'm struggling how to use this within my code to just simply take out these words. I have a list of the words from this dataset already, the part i'm struggling with is comparing to this list and removing the stop words. Any help is appreciated. Daren Thomas from nltk.corpus import stopwords # ... filtered_words = [word for word in word_list if word not in stopwords.words('english')] You could also do a set diff, for example: list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk