this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:
As tf–idf is a very often used for text features, there is also an
See also this on how to get the TF-IDF values of all the documents:
feature_names = tf.get_feature_names()
doc = 0
feature_index = X[doc,:].nonzero()[1]
tfidf_scores = zip(feature_index, [X[doc, x] for x in feature_index])
for w, s in [(feature_names[i], s) for (i, s) in tfidf_scores]:
print w, s
this 0.448320873199
is 0.448320873199
very 0.448320873199
strange 0.630099344518
#and for doc=1
this 0.448320873199
is 0.448320873199
very 0.448320873199
nice 0.630099344518
I think the results are normalized by document:
>>>0.4483208731992+0.4483208731992+0.4483208731992+0.6300993445182 0.9999999999997548