问题
I am experimenting with TensorFlow. One of my first tries consists on learning one of the features based on the data. Let's say my data is composed on the following values:
35, 2, 3, 4, 19, 31, 7, 9, 34, 10, 33, 12, 59, 6, 14, 31, 13
...
35, 4, 7, 14, 9, 3, 17, 19, 42, 11, 3, 1, 53, 12, 17, 30, 15
I would like to predict the value of the last feature, in the example it is going to be the values 13 for the first row and 15 for the last row.
I have around 10000 rows of data. I've written the following model using TensorFlow(I'm following this tutorial):
W0 = tf.Variable(tf.zeros([nb_attributes, 25]))
B0 = tf.Variable(tf.zeros([25]))
W1 = tf.Variable(tf.truncated_normal([25, 30], stddev=0.1))
B1 = tf.Variable(tf.zeros([30]))
W2 = tf.Variable(tf.truncated_normal([30, 70], stddev=0.1))
B2 = tf.Variable(tf.zeros([70]))
W3 = tf.Variable(tf.truncated_normal([70, 150], stddev=0.1))
B3 = tf.Variable(tf.zeros([150]))
W4 = tf.Variable(tf.truncated_normal([150, 75], stddev=0.1))
B4 = tf.Variable(tf.zeros([75]))
W5 = tf.Variable(tf.truncated_normal([75, 54], stddev=0.1))
B5 = tf.Variable(tf.zeros([54]))
# placeholder for input and output
x = tf.placeholder("float", [None, nb_attributes])
Y_ = tf.placeholder("float", [None,54])
XX = tf.reshape(x, [-1, nb_attributes])
Y1 = tf.nn.sigmoid(tf.matmul(XX, W0) + B0)
Y2 = tf.nn.sigmoid(tf.matmul(Y1, W1) + B1)
Y3 = tf.nn.sigmoid(tf.matmul(Y2, W2) + B2)
Y4 = tf.nn.sigmoid(tf.matmul(Y3, W3) + B3)
Y5 = tf.nn.sigmoid(tf.matmul(Y4, W4) + B4)
# learned output
Ylogits = tf.matmul(Y5, W5) + B5
Y = tf.nn.softmax(Ylogits)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)*100
train_step = tf.train.ProximalGradientDescentOptimizer(0.01).minimize(cross_entropy)
The train step is as follows:
for i in range(100):
batch_xs, batch_ys = get_train_events()
sess.run(train_step, feed_dict={x: batch_xs, Y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
test_data_evs, test_data_out = batch_xs, batch_ys
current_accuracy = sess.run(accuracy, feed_dict={x: test_data_evs, Y_: test_data_out})
print 'Current Accuracy {}'.format(current_accuracy)
Please, note that I am using the same data for training than for testing. I am aware that it is not the approach to follow but I am doing on this way because I've found that the accuracy on the test data was so bad that I decided to know what was the accuracy on the training data. As far as I understand it is supposed that the accuracy on the training data after testing must be close to 100%, isn't it?
However, I cannot improve the accuracy to more the 60%. I tried the following:
- Give the data using different strategies
- Using different training optimizer from here
- Change the net architecture
- Using dropout approach
The only step that shown some progress has been to provide testing data randomly in batch of size N. In such case, I managed to move the accuracy from 60 to 64%. I was wondering whether I am applying a wrong approach or committing some stupid or naive error. Any thought in respect of the issue is going to very much appreciated.
Thanks a lot in advance!
EDIT 1: For the sake of completing the question I managed to solve quite well the problem by using the k-neighbour algorithm. This code have helped in my case.
来源:https://stackoverflow.com/questions/43922819/tensorflow-improve-accuracy-on-training-data