Add/remove custom stop words with spacy

前端 未结 6 2074
悲哀的现实
悲哀的现实 2020-12-07 13:04

What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the docum

6条回答
  •  不思量自难忘°
    2020-12-07 13:23

    Using Spacy 2.0.11, you can update its stopwords set using one of the following:

    To add a single stopword:

    import spacy    
    nlp = spacy.load("en")
    nlp.Defaults.stop_words.add("my_new_stopword")
    

    To add several stopwords at once:

    import spacy    
    nlp = spacy.load("en")
    nlp.Defaults.stop_words |= {"my_new_stopword1","my_new_stopword2",}
    

    To remove a single stopword:

    import spacy    
    nlp = spacy.load("en")
    nlp.Defaults.stop_words.remove("whatever")
    

    To remove several stopwords at once:

    import spacy    
    nlp = spacy.load("en")
    nlp.Defaults.stop_words -= {"whatever", "whenever"}
    

    Note: To see the current set of stopwords, use:

    print(nlp.Defaults.stop_words)
    

    Update : It was noted in the comments that this fix only affects the current execution. To update the model, you can use the methods nlp.to_disk("/path") and nlp.from_disk("/path") (further described at https://spacy.io/usage/saving-loading).

提交回复
热议问题