tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

后端未结

关注

 2  1248

粉色の甜心 2020-12-07 15:37

this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:

As tf–idf is a very often used for text features, there is also an

2条回答

青春惊慌失措 (楼主)

2020-12-07 16:18

See also this on how to get the TF-IDF values of all the documents:

feature_names = tf.get_feature_names()
doc = 0
feature_index = X[doc,:].nonzero()[1]
tfidf_scores = zip(feature_index, [X[doc, x] for x in feature_index])
for w, s in [(feature_names[i], s) for (i, s) in tfidf_scores]:
    print w, s

this 0.448320873199
is 0.448320873199
very 0.448320873199
strange 0.630099344518

#and for doc=1
this 0.448320873199
is 0.448320873199
very 0.448320873199
nice 0.630099344518

I think the results are normalized by document:

>>>0.4483208731992+0.4483208731992+0.4483208731992+0.6300993445182 0.9999999999997548

0 讨论(0)

查看其它2个回答