ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0

心已入冬 提交于 2019-11-30 14:11:35

I read this in the following issue on a similar linear module:https://github.com/lensacom/sparkit-learn/issues/49

"Sadly this is a bug indeed. Sparkit trains sklearn's linear models in parallel, then averages them in a reduce step. There is at least one block, which contains only one of the labels. To check try the following:

train_Z[:, 'y']._rdd.map(lambda x: np.unique(x).size).filter(lambda x: x < 2).count()

To resolve You could randomize the train data to avoid blocks with one label, but this is still waiting for a clever solution."

EDIT: I found a solution, the above analysis of the error was correct. This would be a solution.

To Shuffle the arrays in the same order I used a scikitlearn utils module:

from sklearn.utils import shuffle
X_shuf, Y_shuf = shuffle(X_transformed, Y)

Then use those shuffled arrays to train your model again and it'll work!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!