I\'m using tensor flow to process color images with a convolutional neural network. A code snippet is below.
My code runs so I think I got the number of channels rig
TL;DR: With your current program, the in-memory layout of the data should be should be R-G-B-R-G-B-R-G-B-R-G-B...
I assume from this line that you are passing in RGB images with 28x28 pixels:
self.x_image = tf.reshape(self.c_x, [-1, 28, 28, 3])
We can call the dimensions of self.x_image
are "batch", "height", "width", and "channel". This matches the default data format for tf.nn.conv_2d() and tf.nn.max_pool().
In TensorFlow, the in-memory representation of a tensor is row-major order (or "C" ordering, because that is the representation of arrays in the C programming language). Essentially this means that the rightmost dimension is the fastest changing, and the elements of the tensor are packed together in memory in the following order (where ?
stands for the unknown batch size, minus 1):
[0, 0, 0, 0]
[0, 0, 0, 1]
[0, 0, 0, 2]
[0, 0, 1, 0]
...
[?, 27, 27, 1]
[?, 27, 27, 2]
Therefore your program probably isn't interpreting the image data correctly. There are at least two options:
Reshape your data to match its true order ("batch", "channels", "height", "width"):
self.x_image = tf.reshape(self.c_x, [-1, 3, 28, 28])
In fact, this format is sometimes more efficient for convolutions. You can instruct tf.nn.conv2d()
and tf.nn.max_pool()
to use it without transposing by passing the optional argument data_format="NCHW"
, but you will also need to change the shape of your bias variables to match.
Transpose your image data to match the result of your program using tf.transpose():
self.x_image = tf.transpose(tf.reshape(self.c_x, [-1, 3, 28, 28]), [0, 2, 3, 1])