How to do cross-validation with multiple input data in CNN model with Keras

问题

My dataset consists of time series(10080) and other descriptive statistics features(85) joint into one row. DataFrame is 921 x 10166.

The data looks something like this, with last 2 columns as Y(labels).

id    x0  x1    x2   x3   x4   x5  ... x10079   mean var ... Y0     Y1
1    40  31.05 25.5 25.5 25.5 25   ...  33       24   1       1      0
2    35  35.75 36.5 26.5 36.5 36.5 ...  29       31   2       0      1 
3    35  35.70 36.5 36.5 36.5 36.5 ...  29       25   1       1      0 
4    40  31.50 23.5 24.5 26.5 25   ...  33       29   3       0      1
 ... 
921  40  31.05 25.5 25.5 25.5 25   ...  23       33   2       0      1

I checked a few blogs and tutorials which are helpful but I am not sure about how to deal with my input data which I had divided into inputs_1 and inputs_2 as shown in the model below:

inputs_1 = keras.Input(shape=(10081,1))

layer1 = Conv1D(64,14)(inputs_1)
layer2 = layers.MaxPool1D(5)(layer1)
layer3 = Conv1D(64, 14)(layer2)
layer4 = layers.GlobalMaxPooling1D()(layer3)

inputs_2 = keras.Input(shape=(85,))            
layer5 = layers.concatenate([layer4, inputs_2])
layer6 = Dense(128, activation='relu')(layer5)
layer7 = Dense(2, activation='softmax')(layer6)

model_2 = keras.models.Model(inputs = [inputs_1, inputs_2], output = [layer7])

X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:10166], merge[['Result_cat','Result_cat1']].values, test_size=0.2) 
X_train = X_train.to_numpy()
X_train = X_train.reshape([X_train.shape[0], X_train.shape[1], 1]) 
X_train_1 = X_train[:,0:10081,:]
X_train_2 = X_train[:,10081:10166,:].reshape(736,85)  

X_test = X_test.to_numpy()
X_test = X_test.reshape([X_test.shape[0], X_test.shape[1], 1]) 
X_test_1 = X_test[:,0:10081,:]
X_test_2 = X_test[:,10081:10166,:].reshape(185,85)    

adam = keras.optimizers.Adam(lr = 0.0005)
model_2.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['acc'])
history = model_2.fit([X_train_1,X_train_2], y_train, epochs = 120, batch_size = 256, validation_split = 0.2, callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)])

The reason of dividing the features into 2 parts is that inputs_1 is mainly about the time series data, while inputs_2 is the descriptive statistics data. I thought it'd be better to separate them given the different nature of data. Please correct me if I'm wrong.

My question is, since my features data is divided and treated separately in the original model, should I do the same in cross validation(treat inputs_1 and inputs_2 separately)? In particular, for example, in Jason's model:

# MLP for Pima Indians Dataset with 10-fold cross validation
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
  # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

evaluation was done using code scores = model.evaluate(X[test], Y[test], verbose=0) where X[test], Y[test] were used. In my case, since I have inputs_1 and inputs_2 instead of X(in example model), should I use something like [inputs_1,inputs_2][test]?

Any advice is appreciated. Thanks

Update:

I tried to concatenate inputs_1 and inputs_2 with

con_x = np.concatenate((X_train_1,X_train_2), axis = 1)

and changed the first line of model to

for train, test in kfold.split(con_x, Y):

but it returned

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-d53a7058d157> in <module>()
     55 cvscores = []
---> 56 for train, test in kfold.split(con_x, Y):
     57 
     58     inputs_1 = keras.Input(shape=(10080,1))

1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    537         if not allow_nd and array.ndim >= 3:
    538             raise ValueError("Found array with dim %d. %s expected <= 2."
--> 539                              % (array.ndim, estimator_name))
    540         if force_all_finite:
    541             _assert_all_finite(array,

ValueError: Found array with dim 3. Estimator expected <= 2.

But still, I am not sure if it is valid to concatenate inputs_1 and inputs_2 like this.

来源：https://stackoverflow.com/questions/59277549/how-to-do-cross-validation-with-multiple-input-data-in-cnn-model-with-keras

标签

python

machine-learning

keras

conv-neural-network

cross-validation