add stemming support to CountVectorizer (sklearn)

后端 未结 3 2045
北荒
北荒 2021-01-31 18:51

I\'m trying to add stemming to my pipeline in NLP with sklearn.

from nltk.stem.snowball import FrenchStemmer

stop = stopwords.words(\'french\')
stemmer = French         


        
3条回答
  •  灰色年华
    2021-01-31 19:41

    You can try:

    def build_analyzer(self):
        analyzer = super(CountVectorizer, self).build_analyzer()
        return lambda doc:(stemmer.stem(w) for w in analyzer(doc))
    

    and remove the __init__ method.

提交回复
热议问题