Keep TFIDF result for predicting new content using Scikit for Python

后端 未结 5 524
有刺的猬
有刺的猬 2020-12-07 21:22

I am using sklearn on Python to do some clustering. I\'ve trained 200,000 data, and code below works well.

corpus = open(\"token_from_xml.txt\")
vectorizer =         


        
5条回答
  •  盖世英雄少女心
    2020-12-07 21:54

    If you want to store features list for testing data for use in future, you can do this:

    tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))
    
    #store the content
    with open("x_result.pkl", 'wb') as handle:
                        pickle.dump(tfidf, handle)
    #load the content
    tfidf = pickle.load(open("x_result.pkl", "rb" ) )
    

提交回复
热议问题