问题
I am trying to fit a decision tree to matrices of features and labels. Here is my code:
print FEATURES_DATA[0]
print ""
print TARGET[0]
print ""
print np.unique(list(map(len, FEATURES_DATA[0])))
which gives the following output:
[ array([[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
...,
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6]], dtype=uint8)]
[ array([[31],
[31],
[31],
...,
[22],
[22],
[22]], dtype=uint8)]
[463511]
The matrix actually contains 463511 samples.
Thereafter, I run the following block:
from sklearn.tree import DecisionTreeClassifier
for i in xrange(5):
Xtrain=FEATURES_DATA[i]
Ytrain=TARGET[i]
clf=DecisionTreeClassifier()
clf.fit(Xtrain,Ytrain)
which gives me the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-3d8b2a7a3e5f> in <module>()
4 Ytrain=TARGET[i]
5 clf=DecisionTreeClassifier()
----> 6 clf.fit(Xtrain,Ytrain)
C:\Users\singhg2\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\tree\tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
152 random_state = check_random_state(self.random_state)
153 if check_input:
--> 154 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
155 if issparse(X):
156 X.sort_indices()
C:\Users\singhg2\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
371 force_all_finite)
372 else:
--> 373 array = np.array(array, dtype=dtype, order=order, copy=copy)
374
375 if ensure_2d:
ValueError: setting an array element with a sequence.
I searched other posts on SO and found that most of the answers were that the matrices were not completely numbers, or the array is differing in the length across samples. But, this is not the case with my problem?
Any help?
回答1:
if print FEATURES_DATA[0]
actually prints
[ array([[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
...,
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6]], dtype=uint8)]
then the problem is that FEATURES_DATA[0] is a python list with a numpy array inside it. (You can understand that from the [
and ]
)
You can select the first (and only) element of of the list to fix it
from sklearn.tree import DecisionTreeClassifier
for i in xrange(5):
Xtrain=FEATURES_DATA[i][0]
Ytrain=TARGET[i][0]
clf=DecisionTreeClassifier()
clf.fit(Xtrain,Ytrain)
来源:https://stackoverflow.com/questions/37548189/valueerror-setting-an-array-element-with-a-sequence-with-decision-tree-where-al