TensorFlow: Reading images in queue without shuffling

问题

I have a training set of 614 images which have already been shuffled. I want to read the images in order in batches of 5. Because my labels are arranged in the same order, any shuffling of the images when being read into the batch will result in incorrect labelling.

These are my functions to read and add the images to the batch:

# To add files from queue to a batch:
def add_to_batch(image):

    print('Adding to batch')
    image_batch = tf.train.batch([image],batch_size=5,num_threads=1,capacity=614)

    # Add to summary
    tf.image_summary('images',image_batch,max_images=30)

    return image_batch

# To read files in queue and process:
def get_batch():

    # Create filename queue of images to read
    filenames = [('/media/jessica/Jessica/TensorFlow/StreetView/training/original/train_%d.png' % i) for i in range(1,614)]
    filename_queue =   tf.train.string_input_producer(filenames,shuffle=False,capacity=614)
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    # Read and process image
    # Image is 500 x 275:
    my_image = tf.image.decode_png(value)
    my_image_float = tf.cast(my_image,tf.float32)
    my_image_float = tf.reshape(my_image_float,[275,500,4])

    return add_to_batch(my_image_float)

This is my function to perform the prediction:

def inference(x):

    < Perform convolution, pooling etc.>

    return y_conv

This is my function to calculate loss and perform optimisation:

def train_step(y_label,y_conv):

    """ Calculate loss """
    # Cross-entropy
    loss = -tf.reduce_sum(y_label*tf.log(y_conv + 1e-9))

    # Add to summary
    tf.scalar_summary('loss',loss)

    """ Optimisation """
    opt = tf.train.AdamOptimizer().minimize(loss)

    return loss

This is my main function:

def main ():

    # Training
    images = get_batch()
    y_conv = inference(images)
    loss = train_step(y_label,y_conv)

    # To write and merge summaries
    writer = tf.train.SummaryWriter('/media/jessica/Jessica/TensorFlow/StreetView/SummaryLogs/log_5', graph_def=sess.graph_def)
    merged = tf.merge_all_summaries()

    """ Run session """
    sess.run(tf.initialize_all_variables())
    tf.train.start_queue_runners(sess=sess)

    print "Running..."
    for step in range(5):

        # y_1 = <get the correct labels here>

        # Train
        loss_value = sess.run(train_step,feed_dict={y_label:y_1})
        print "Step %d, Loss %g"%(step,loss_value)

        # Save summary
        summary_str = sess.run(merged,feed_dict={y_label:y_1})
        writer.add_summary(summary_str,step)

    print('Finished')

if __name__ == '__main__':
  main()

When I check my image_summary the images do not seem to be in sequence. Or rather, what is happening is:

Images 1-5: discarded, Images 6-10: read, Images 11-15: discarded, Images 16-20: read etc.

So it looks like I am getting my batches twice, throwing away the first one and using the second one? I have tried a few remedies but nothing seems to work. I feel like I am understanding something fundamentally wrong about calling images = get_batch() and sess.run().

回答1:

Your batch operation is a FIFOQueue, so every time you use it's output, it advances the state.

Your first session.run call uses the images 1-5 in the computation of train_step, your second session.run asks for the computation of image_summary which pulls images 5-6 and uses them in the visualization.

If you want to visualize things without affecting the state of input, it helps to cache queue values in variables and define your summaries with variables as inputs rather than depending on live queue.

(image_batch_live,) = tf.train.batch([image],batch_size=5,num_threads=1,capacity=614)

image_batch = tf.Variable(
  tf.zeros((batch_size, image_size, image_size, color_channels)),
  trainable=False,
  name="input_values_cached")

advance_batch = tf.assign(image_batch, image_batch_live)

So now your image_batch is a static value which you can use both for computing loss and visualization. Between steps you would call sess.run(advance_batch) to advance the queue.

Minor wrinkle with this approach -- default saver will save your image_batch variable to checkpoint. If you ever change your batch-size, then your checkpoint restore will fail with dimension mismatch. To work-around you would need to specify the list of variables to restore manually, and run initializers for the rest.

来源：https://stackoverflow.com/questions/36783560/tensorflow-reading-images-in-queue-without-shuffling

标签

image

queue

tensorflow