Using explicit (predefined) validation set for grid search with sklearn

前端 未结 3 1626
隐瞒了意图╮
隐瞒了意图╮ 2020-12-07 17:38

I have a dataset, which has previously been split into 3 sets: train, validation and test. These sets have to be used as given in order to compare the performance across dif

3条回答
  •  死守一世寂寞
    2020-12-07 18:13

    # Import Libraries
    from sklearn.model_selection import train_test_split, GridSearchCV
    from sklearn.model_selection import PredefinedSplit
    
    # Split Data to Train and Validation
    X_train, X_val, y_train, y_val = train_test_split(X, y, train_size = 0.8, stratify = y,random_state = 2020)
    
    # Create a list where train data indices are -1 and validation data indices are 0
    split_index = [-1 if x in X_train.index else 0 for x in X.index]
    
    # Use the list to create PredefinedSplit
    pds = PredefinedSplit(test_fold = split_index)
    
    # Use PredefinedSplit in GridSearchCV
    clf = GridSearchCV(estimator = estimator,
                       cv=pds,
                       param_grid=param_grid)
    
    # Fit with all data
    clf.fit(X, y)
    

提交回复
热议问题