Getting names and number of selected features before giving to a classifier in sklearn pipeline

自闭症网瘾萝莉.ら 提交于 2019-12-11 06:35:08

问题


I am using sel = SelectFromModel(ExtraTreesClassifier(10), threshold='mean') to select the most important features in my data set.

Then I want to feed these selected features to my keras classifier. But my keras based Neural Network classifier needs the number of imprtant features selected in the first step. Below is the code for my keras classifier and the variable X_new is the numpy array of new features selected.

The code for keras classifier is as under.

def create_model( 
             dropout=0.2):

n_x_new=X_new.shape[1]
np.random.seed(6000)
model_new = Sequential()
model_new.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dense(10, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dropout(0.2))
model_new.add(Dense(1,kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.compile(loss='binary_crossentropy',optimizer='adam', metrics=['binary_crossentropy'])

return model_new

seed = 7
np.random.seed(seed) 

clf=KerasClassifier(build_fn=create_model, epochs=10, batch_size=1000, verbose=0)


param_grid = {'clf__dropout':[0.1,0.2]}
model = Pipeline([('sel', sel),('clf', clf),])


grid = GridSearchCV(estimator=model, param_grid=param_grid,scoring='roc_auc', n_jobs=1)
grid_result = grid.fit(np.concatenate((train_x_upsampled, cross_val_x_upsampled), axis=0), np.concatenate((train_y_upsampled, cross_val_y_upsampled), axis=0))

As I am using Pipline with grid search, I don't understand how my neural network will get the important features selected in the first step. I want to get those important features selected into an array of X_new.

Do I need to implement a custom estimator in between sel and keras model?

If yes, How would I implement one? I know the generic code for custom estimator but I am unable to mold it according to my requirement. The generic code is as under.

class new_features(TransformerMixin):
def transform(self, X):
    X_new = sel.transform(X)
    return X_new

But this is not working. Is there any way I can solve this problem without using custom estimator in between?

来源:https://stackoverflow.com/questions/48762580/getting-names-and-number-of-selected-features-before-giving-to-a-classifier-in-s

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!