I was training an Ann machine learning model using GridSearchCV and got stuck with an IndexError in gridSearchCV

為{幸葍}努か 提交于 2020-01-17 15:31:53

问题


My model starts to train and while executing for sometime it gives an error :- IndexError: index 37 is out of bounds for axis 0 with size 37

It executes properly for my model without using gridsearchCV with fixed parameters

Here is my code

    from keras.wrappers.scikit_learn import KerasClassifier
    from sklearn.model_selection import GridSearchCV
    from keras.models import Sequential
    from keras.layers import Dense
    def build_classifier(optimizer, nb_layers,unit):
        classifier = Sequential()
        classifier.add(Dense(units = unit, kernel_initializer = 'uniform', activation = 'relu', input_dim = 14))
        i = 1
        while i <= nb_layers:
            classifier.add(Dense(activation="relu", units=unit, kernel_initializer="uniform"))
            i += 1
        classifier.add(Dense(units = 38, kernel_initializer = 'uniform', activation = 'softmax'))
        classifier.compile(optimizer = optimizer, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
        return classifier
    classifier = KerasClassifier(build_fn = build_classifier)
    parameters = {'batch_size': [10,25],
                  'epochs': [100,200],
                  'optimizer': ['adam'],
                  'nb_layers': [5,6,7],
                  'unit':[48,57,76]
                 }
    grid_search = GridSearchCV(estimator = classifier,
                               param_grid = parameters,
                               scoring = 'accuracy',
                              cv=5,n_jobs=-1)
    grid_search = grid_search.fit(X_train, y_train)
    best_parameters = grid_search.best_params_
    best_accuracy = grid_search.best_score_

回答1:


The error IndexError: index 37 is out of bounds for axis 0 with size 37 means that there is no element with index 37 in your object.

In python, if you have an object like array or list, which has elements indexed numerically, if it has n elements, indexes will go from 0 to n-1 (this is the general case, with the exception of reindexing in dataframes).

So, if you ahve 37 elements you can only retrieve elements from 0-36.




回答2:


This is a multi-class classifier with a huge Number of Classes (38 classes). It seems like GridSearchCV isn't spliting your dataset by stratified sampling, may be because you haven't enough data and/or your dataset isn't class-balanced.

According to the documentation:

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

By using categorical_crossentropy, KerasClassifier will convert targets (a class vector (integers)) to binary class matrix using keras.utils.to_categorical. Since there are 38 classes, each target will be converted to a binary vector of dimension 38 (index from 0 to 37).

I guess that in some splits, the validation set doesn't have samples from all the 38 classes, so targets are converted to vectors of dimension < 38, but since GridSearchCV is fitted with samples from all the 38 classes, it expects vectors of dimension = 38, which causes this error.




回答3:


Take a look at the shape of your y_train. It need to be a some sort of one hot with shape (,37)



来源:https://stackoverflow.com/questions/58131814/i-was-training-an-ann-machine-learning-model-using-gridsearchcv-and-got-stuck-wi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!