tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

后端 未结 2 1248
粉色の甜心
粉色の甜心 2020-12-07 15:37

this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:

As tf–idf is a very often used for text features, there is also an

2条回答
  •  青春惊慌失措
    2020-12-07 16:18

    See also this on how to get the TF-IDF values of all the documents:

    feature_names = tf.get_feature_names()
    doc = 0
    feature_index = X[doc,:].nonzero()[1]
    tfidf_scores = zip(feature_index, [X[doc, x] for x in feature_index])
    for w, s in [(feature_names[i], s) for (i, s) in tfidf_scores]:
        print w, s
    
    this 0.448320873199
    is 0.448320873199
    very 0.448320873199
    strange 0.630099344518
    
    #and for doc=1
    this 0.448320873199
    is 0.448320873199
    very 0.448320873199
    nice 0.630099344518
    

    I think the results are normalized by document:

    >>>0.4483208731992+0.4483208731992+0.4483208731992+0.6300993445182 0.9999999999997548

提交回复
热议问题