发表新帖

发表新帖

RandomForestClassfier.fit(): ValueError: could not convert string to float

后端未结

关注

 8  909

礼貌的吻别 2020-12-23 09:16

Given is a simple CSV file:

A,B,C
Hello,Hi,0
Hola,Bueno,1

Obviously the real dataset is far more complex than this, but this one reproduces

8条回答

粉色の甜心 (楼主)

2020-12-23 09:47
You may not pass str to fit this kind of classifier.

For example, if you have a feature column named 'grade' which has 3 different grades:

A,B and C.

you have to transfer those str "A","B","C" to matrix by encoder like the following:
```
A = [1,0,0]

B = [0,1,0]

C = [0,0,1]
```
because the str does not have numerical meaning for the classifier.

In scikit-learn, OneHotEncoder and LabelEncoder are available in inpreprocessing module. However OneHotEncoder does not support to fit_transform() of string. "ValueError: could not convert string to float" may happen during transform.

You may use LabelEncoder to transfer from str to continuous numerical values. Then you are able to transfer by OneHotEncoder as you wish.

In the Pandas dataframe, I have to encode all the data which are categorized to dtype:object. The following code works for me and I hope this will help you.
```
 from sklearn import preprocessing
    le = preprocessing.LabelEncoder()
    for column_name in train_data.columns:
        if train_data[column_name].dtype == object:
            train_data[column_name] = le.fit_transform(train_data[column_name])
        else:
            pass
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题