Keep TFIDF result for predicting new content using Scikit for Python

后端 未结 5 508
有刺的猬
有刺的猬 2020-12-07 21:22

I am using sklearn on Python to do some clustering. I\'ve trained 200,000 data, and code below works well.

corpus = open(\"token_from_xml.txt\")
vectorizer =         


        
5条回答
  •  执笔经年
    2020-12-07 21:49

    you can do the vectorization and tfidf transformation in one stage:

    vec =TfidfVectorizer()
    

    then fit and transform on the training data

    tfidf = vec.fit_transform(training_data)
    

    and use the tfidf model to transform

    unseen_tfidf = vec.transform(unseen_data)
    km = KMeans(30)
    kmresult = km.fit(tfidf).predict(unseen_tfid)
    

提交回复
热议问题