RandomForestClassfier.fit(): ValueError: could not convert string to float

后端 未结 8 909
礼貌的吻别
礼貌的吻别 2020-12-23 09:16

Given is a simple CSV file:

A,B,C
Hello,Hi,0
Hola,Bueno,1

Obviously the real dataset is far more complex than this, but this one reproduces

8条回答
  •  粉色の甜心
    2020-12-23 09:47

    You may not pass str to fit this kind of classifier.

    For example, if you have a feature column named 'grade' which has 3 different grades:

    A,B and C.

    you have to transfer those str "A","B","C" to matrix by encoder like the following:

    A = [1,0,0]
    
    B = [0,1,0]
    
    C = [0,0,1]
    

    because the str does not have numerical meaning for the classifier.

    In scikit-learn, OneHotEncoder and LabelEncoder are available in inpreprocessing module. However OneHotEncoder does not support to fit_transform() of string. "ValueError: could not convert string to float" may happen during transform.

    You may use LabelEncoder to transfer from str to continuous numerical values. Then you are able to transfer by OneHotEncoder as you wish.

    In the Pandas dataframe, I have to encode all the data which are categorized to dtype:object. The following code works for me and I hope this will help you.

     from sklearn import preprocessing
        le = preprocessing.LabelEncoder()
        for column_name in train_data.columns:
            if train_data[column_name].dtype == object:
                train_data[column_name] = le.fit_transform(train_data[column_name])
            else:
                pass
    

提交回复
热议问题