ValueError: could not convert string to float in panda

混江龙づ霸主 提交于 2020-01-11 11:35:31

问题


My code is :

 import pandas as pd
data = pd.read_table('train.tsv')

X=data.Phrase
Y=data.Sentiment
from sklearn import cross_validation
X_train,X_test,Y_train,Y_test=cross_validation.train_test_split(X,Y,test_size=0.2,random_state=0)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X,Y)

I get the error :ValueError: could not convert string to float:

What changes can I make that my code works?


回答1:


You can't pass in text data into MultinomialNB of scikit-learn as stated in its documentation.

None of the algorithms in scikit-learn works directly with text data. You need to do some preprocessing to get desired output. You'll need to first extract the features from text data using techniques like bagging or tokenizing. Have a look at this link for better understanding.

You also might want to look at using NLTK for such use cases as yours.




回答2:


ValueError when using Multinomial Naive Bayes classifier

You probably should preprocess your data as shown in the answer above.



来源:https://stackoverflow.com/questions/44176978/valueerror-could-not-convert-string-to-float-in-panda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!