sklearn TfidfVectorizer : Generate Custom NGrams by not removing stopword in them
问题 Following is my code: sklearn_tfidf = TfidfVectorizer(ngram_range= (3,3),stop_words=stopwordslist, norm='l2',min_df=0, use_idf=True, smooth_idf=False, sublinear_tf=True) sklearn_representation = sklearn_tfidf.fit_transform(documents) It generates tri gram by removing all the stopwords. What I want it to allow those TRIGRAM what have stopword in their middle ( not in start and end) Is there processor needs to be written for this. Need suggestions. 回答1: Yes, you need to supply your own analyzer