python error : too many indices for array

匿名 (未验证) 提交于 2019-12-03 01:06:02

问题:

My input was a csv file which was imported to postgresqldb .Later i am building a cnn using keras.My code below gives the following error "IndexError: too many indices for array". I am quite new to machine learning so I do not have any idea about how to solve this. Any suggestions?

X = dataframe1[['Feature1','Feature2','Feature3','Feature4','Feature5','Feature6','Feature7','Feature8','Feature9','Feature10','Feature11\1','Feature12','Feature13','Feature14']] Y=result[['label']]    # evaluate model with standardized dataset results = cross_val_score(estimator, X, Y, cv=kfold) print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 

Error

    ---------------------------------------------------------------------------     IndexError                                Traceback (most recent call last)     <ipython-input-50-0e5d0345015f> in <module>()           2 estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)           3 kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)     ----> 4 results = cross_val_score(estimator, X, Y, cv=kfold)           5 print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))      C:\Anacondav3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)         129          130     cv = check_cv(cv, y, classifier=is_classifier(estimator))     --> 131     cv_iter = list(cv.split(X, y, groups))         132     scorer = check_scoring(estimator, scoring=scoring)         133     # We clone the estimator to make sure that all the folds are      C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)         320                                                              n_samples))         321      --> 322         for train, test in super(_BaseKFold, self).split(X, y, groups):         323             yield train, test         324       C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)          89         X, y, groups = indexable(X, y, groups)          90         indices = np.arange(_num_samples(X))     ---> 91         for test_index in self._iter_test_masks(X, y, groups):          92             train_index = indices[np.logical_not(test_index)]          93             test_index = indices[test_index]      C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _iter_test_masks(self, X, y, groups)         608          609     def _iter_test_masks(self, X, y=None, groups=None):     --> 610         test_folds = self._make_test_folds(X, y)         611         for i in range(self.n_splits):         612             yield test_folds == i      C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _make_test_folds(self, X, y, groups)         595         for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)):         596             for cls, (_, test_split) in zip(unique_y, per_cls_splits):     --> 597                 cls_test_folds = test_folds[y == cls]         598                 # the test split can be too big because we used         599                 # KFold(...).split(X[:max(c, n_splits)]) when data is not 100%  IndexError: too many indices for array 

Is there a different way that I should be declaring the array or dataframe?

回答1:

Notice that the example in the User Guide shows that X is 2-dimensional while y is 1-dimensional:

>>> X_train.shape, y_train.shape ((90, 4), (90,)) 

Some programmers use capitalized variables for 2-dimensional arrays and lower-case for 1-dimensional arrays.

Therefore use

Y = result['label'] 

instead of

Y = result[['label']] 

I am assuming that result is a pandas DataFrame. When you index a Dataframe with a list of columns such as ['label'], a sub-DataFrame -- which is 2-dimensional -- is returned. If you index the DataFrame with a single string, a 1-dimensional Series is returned.


Finally, note that the IndexError

IndexError: too many indices for array 

is raised on this line

cls_test_folds = test_folds[y == cls] 

because y is 2-dimensional so y == cls is a 2-dimensional boolean array and test_folds is 1-dimensional. The situation is similar to the following:

In [72]: test_folds = np.zeros(5, dtype=np.int) In [73]: y_eq_cls = np.array([(True, ), (False,)]) In [74]: test_folds[y_eq_cls] IndexError: too many indices for array 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!