Add/remove stop words with spacy

匿名 (未验证) 提交于 2019-12-03 02:49:01

问题:

What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custome changes to the set. I was looking at the doccumentation but could not find anything regarding of stop words. Thanks!

回答1:

You can edit them before processing your text like this (see this post):

>>> import spacy >>> nlp = spacy.load("en") >>> nlp.vocab["the"].is_stop = False >>> nlp.vocab["definitelynotastopword"].is_stop = True >>> sentence = nlp("the word is definitelynotastopword") >>> sentence[0].is_stop False >>> sentence[3].is_stop True 

Note: This seems to work <=v1.8. For newer versions, see other answers.



回答2:

For version 2.0 I used this:

from spacy.lang.en.stop_words import STOP_WORDS  print(STOP_WORDS) # <- set of Spacy's default stop words  STOP_WORDS.add("your_additional_stop_word_here")  for word in STOP_WORDS:     lexeme = nlp.vocab[word]     lexeme.is_stop = True 

This loads all stop words into a set.

You can amend your stop words to STOP_WORDS or use your own list in the first place.



回答3:

For 2.0 use the following:

for word in nlp.Defaults.stop_words: lex = nlp.vocab[word] lex.is_stop = True



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!