SVC (support vector classification) with categorical (string) data as labels

烈酒焚心 提交于 2020-01-14 13:15:52

问题


I use scikit-learn to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).

I try to fit the model:

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(features_training,labels_training)

But at the second line, I get an error: ValueError: could not convert string to float: 'A'

The error is expected because label_training contains string values which represent three different categories, such as A, B, C.

So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance, A = 0, B = 1, etc. But is this really the best solution?


回答1:


Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.

In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.




回答2:


you can try this code:

from sklearn import svm
X = [[0, 0], [1, 1],[2,3]]
y = ['A', 'B','C']
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X, y)  
clf.predict([[2,3]])

output: array(['C'], dtype='|S1')

You should take the dependent variable (y) as 'list'.



来源:https://stackoverflow.com/questions/38584829/svc-support-vector-classification-with-categorical-string-data-as-labels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!