I am using sklearn on Python to do some clustering. I\'ve trained 200,000 data, and code below works well.
corpus = open(\"token_from_xml.txt\")
vectorizer =
If you want to store features list for testing data for use in future, you can do this:
tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))
#store the content
with open("x_result.pkl", 'wb') as handle:
pickle.dump(tfidf, handle)
#load the content
tfidf = pickle.load(open("x_result.pkl", "rb" ) )