Neural Network in TensorFlow works worse than Random Forest and predict the same label each time

问题

I am new in DNN and TesorFlow. I have the problem with NN using for binary classification.

As input data I have text dataset, which was transformed by TF-IDF into numerical vectors.

The number of rows for training dataset is 43 000 The number of features 4235

I tried to use TFlearn library and then Keras io. But the result is the same - NN predict only one label 0 or 1 and give worse Accuracy then Random Forest.

I will add the script, which I use for NN building. Please, tell me what is wrong in it.

model = Sequential()

model.add(Dense(100, input_dim=4235, init='uniform', activation='relu'))
model.add(Dense(4235, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, nb_epoch=100, batch_size=10,  verbose=2)

回答1:

There's many possible reasons with just the information you provided and also many things you can try to improve, but from a high level here's the most important items in my experience. I apologize if you already checked most of this:

Amount of data

Deep Learning might actually perform worst than "classical" ML (e.g. trees, svm) when there's not enough data. How much is enough is task dependent, but as a loose rule of thumb you may want to have a number of model parameters around the same order of magnitude as the amount of data you have. In the model you posted you have 100 x 4235 + 100 x 4235 + 4235 * 1 = 851,235 parameters.

Regularization

From the code you posted it seems you're not using any regularization (e.g. dropout or L2) nor using a validation set to measure the quality of the model out of the training set. Your model could be overfitting the training set.

Architecture

For modeling text it's typical to use RNNs (e.g. LSTM or GRU) or CNNs instead of Dense/Fully connected layers. RNNs and CNNs contain architectural constraints to model sequences that is absent in Dense layers. In other words Dense layers lack prior knowledge about the type of data so they will potentially need a lot more data/train time to attain similar performance. There's plenty of examples of this in the Keras repo: https://github.com/fchollet/keras/tree/master/examples

One such example is this IMDB text (binary) classification with LSTM: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

Text featurization

Another very common tool in deep learning is to encode text as a sequence of word vectors (and sometimes one-hot characters). These can either be initialized a s random vectors or initialized with pre-trained vectors (e.g. GLOVE and word2vec). The example above uses the former approach.

来源：https://stackoverflow.com/questions/40459022/neural-network-in-tensorflow-works-worse-than-random-forest-and-predict-the-same

标签

neural-network

tensorflow

classification

keras