Python RandomForest - Unknown label Error

匿名 (未验证) 提交于 2019-12-03 01:47:02

问题:

I have trouble using RandomForest fit function

This is my training set

         P1      Tp1           IrrPOA     Gz          Drz2 0        0.0     7.7           0.0       -1.4        -0.3 1        0.0     7.7           0.0       -1.4        -0.3 2        ...     ...           ...        ...         ... 3        49.4    7.5           0.0       -1.4        -0.3 4        47.4    7.5           0.0       -1.4        -0.3 ... (10k rows) 

I want to predict P1 thanks to all the other variables using sklearn.ensemble RandomForest

colsRes = ['P1'] X_train = train.drop(colsRes, axis = 1) Y_train = pd.DataFrame(train[colsRes]) rf = RandomForestClassifier(n_estimators=100) rf.fit(X_train, Y_train) 

Here is the error I get:

ValueError: Unknown label type: array([[  0. ],        [  0. ],        [  0. ],        ...,         [ 49.4],        [ 47.4], 

I did not find anything about this label error, I use Python 3.5. Any advice would be a great help !

回答1:

When you are passing label (y) data to rf.fit(X,y), it expects y to be 1D list. Slicing the Panda frame always result in a 2D list. So, conflict raised in your use-case. You need to convert the 2D list provided by pandas DataFrame to a 1D list as expected by fit function.

Try using 1D list first:

Y_train = list(train.P1.values) 

If this does not solve the problem, you can try with solution mentioned in MultinomialNB error: "Unknown Label Type":

Y_train = np.asarray(train['P1'], dtype="|S6") 

So your code becomes,

colsRes = ['P1'] X_train = train.drop(colsRes, axis = 1) Y_train = np.asarray(train['P1'], dtype="|S6") rf = RandomForestClassifier(n_estimators=100) rf.fit(X_train, Y_train) 


回答2:

According to this SO post, Classifiers need integer or string labels.

You could consider switching to a regression model instead (that might better suit your data, as each datum appears to be a float), like so:

X_train = train.drop('P1', axis=1) Y_train = train['P1'] rf = RandomForestRegressor(n_estimators=100) rf.fit(X_train.as_matrix(), Y_train.as_matrix()) 


回答3:

may be a tad late to the party but I just got this error and solved it by making sure my y variable was type(int) using

 y = df['y_variable'].astype(int)  

before doing a train test split, also like others have said you problem seems better fit with a RFReg rather then RF



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!