How to get tfidf with pandas dataframe?

后端 未结 3 1256
天涯浪人
天涯浪人 2020-12-05 04:22

I want to calculate tf-idf from the documents below. I\'m using python and pandas.

import pandas as pd
df = pd.DataFrame({\'docId\': [1,2,3], 
                       


        
3条回答
  •  庸人自扰
    2020-12-05 05:04

    Scikit-learn implementation is really easy :

    from sklearn.feature_extraction.text import TfidfVectorizer
    v = TfidfVectorizer()
    x = v.fit_transform(df['sent'])
    

    There are plenty of parameters you can specify. See the documentation here

    The output of fit_transform will be a sparse matrix, if you want to visualize it you can do x.toarray()

    In [44]: x.toarray()
    Out[44]: 
    array([[ 0.64612892,  0.38161415,  0.        ,  0.38161415,  0.38161415,
             0.        ,  0.38161415],
           [ 0.        ,  0.38161415,  0.64612892,  0.38161415,  0.38161415,
             0.        ,  0.38161415],
           [ 0.        ,  0.38161415,  0.        ,  0.38161415,  0.38161415,
             0.64612892,  0.38161415]])
    

提交回复
热议问题