I have the following code
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.cross_validation import cross_val_score
#split the dataset for tr
It seems to be fixable if you specify the target labels as a single data column from Pandas. If the target has multiple columns, I get a similar error. For example try:
labels = train['Y']
try target:
y=df['Survived']
instead , i used
y=df[['Survived']]
which made the target y a dateframe, it seems series would be ok
Adding .ravel()
to the Y/Labels variable passed into the formula helped solve this problem within KNN as well.
You might need to play with the dimensions a bit, e.g.
et_score = cross_val_score(et, features, labels, n_jobs=-1)[:,n]
or
et_score = cross_val_score(et, features, labels, n_jobs=-1)[n,:]
n being the dimension.
When we do cross validation in scikit-learn, the process requires an (R,)
shape label instead of (R,1)
. Although they are the same thing to some extend, their indexing mechanisms are different. So in your case, just add:
c, r = labels.shape
labels = labels.reshape(c,)
before passing it to the cross-validation function.