python textblob and text classification

☆樱花仙子☆ 提交于 2019-12-21 02:59:07

问题


I'm trying do build a text classification model with python and textblob, the script is runing on my server and in the future the idea is that users will be able to submit their text and it will be classified. i'm loading the training set from csv :

# -*- coding: utf-8 -*-
import sys
import codecs
sys.stdout = open('yyyyyyyyy.txt',"w");
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
with open('file.csv', 'r', encoding='latin-1') as fp:
    cl = NaiveBayesClassifier(fp, format="csv")  

print(cl.classify("some text"))

csv is about 500 lines long (with string between 10 and 100 chars), and NaiveBayesclassifier needs about 2 minutes for training and then be able to classify my text(not sure if is normal that it need so much time, maybe is my server slow with only 512mb ram).

example of csv line :

"Oggi alla Camera con la Fondazione Italia-Usa abbiamo consegnato a 140 studenti laureati con 110 e 110 lode i diplomi del Master in Marketing Comunicazione e Made in Italy.",FI-PDL

what is not clear to me, and i cant find an answer on textblob documentation, is if there is a way to 'save' my trained classifier (so save a lot of time), because by now everytime i run the script it will train again the classifier. I'm new to text classification and machine learing so my apologize if it is a dumb question.

Thanks in advance.


回答1:


Ok found that pickle module is what i need :)

Training:

# -*- coding: utf-8 -*-
import pickle
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
with open('file.csv', 'r', encoding='latin-1') as fp:
    cl = NaiveBayesClassifier(fp, format="csv")  

object = cl
file = open('classifier.pickle','wb') 
pickle.dump(object,file)

extracting:

import pickle
sys.stdout = open('demo.txt',"w");
from nltk.tokenize import word_tokenize
from textblob.classifiers import NaiveBayesClassifier
cl = pickle.load( open( "classifier.pickle", "rb" ) )
print(cl.classify("text to classify"))


来源:https://stackoverflow.com/questions/33883976/python-textblob-and-text-classification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!