Keep TFIDF result for predicting new content using Scikit for Python

后端 未结 5 525
有刺的猬
有刺的猬 2020-12-07 21:22

I am using sklearn on Python to do some clustering. I\'ve trained 200,000 data, and code below works well.

corpus = open(\"token_from_xml.txt\")
vectorizer =         


        
5条回答
  •  南笙
    南笙 (楼主)
    2020-12-07 21:54

    a simpler solution, just use joblib libarary as document said:

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.externals import joblib
    
    vectorizer = CountVectorizer()
    X = vectorizer.fit_transform(texts)
    feature_name = vectorizer.get_feature_names()
    tfidf = TfidfTransformer()
    tfidf.fit(X)
    
    # save your model in disk
    joblib.dump(transformer, 'tfidf.pkl') 
    
    # load your model
    tfidf = joblib.load('tfidf.pkl') 
    

提交回复
热议问题