tensorflow: efficient feeding of eval/train data using queue runners

前端 未结 2 1494
一个人的身影
一个人的身影 2020-12-31 08:03

I\'m trying to run a tensorflow graph to train a model and periodically evaluate using a separate evaluation dataset. Both training and evaluation data is implemented using

2条回答
  •  没有蜡笔的小新
    2020-12-31 08:47

    After some experimentation, my current best solution is to have a main graph featuring training inputs and a separate graph with just evaluation data operations. I open a separate session to get evaluation data and feed this to the training graph when I want to evaluate. Highly inelegant (and evaluation runs take longer than they ideally would as they have to come ot of one session only to be fed to another), but assuming evaluation runs are rare compared to training runs, this seems preferable to the original version...

    import tensorflow as tf
    from tensorflow.models.image.cifar10 import cifar10
    from time import time
    
    
    class DataSupplier:
        def __init__(self, tensor_fn):
            graph = tf.Graph()
            with graph.as_default():
                with graph.device('/cpu:0'):
                    self.tensor = tensor_fn()
            self.sess = tf.Session(graph=graph)
            self.coord = tf.train.Coordinator()
            self.threads = tf.train.start_queue_runners(sess=self.sess,
                                                        coord=self.coord)
    
        def get_tensor_val(self):
            return self.sess.run(self.tensor)
    
        def clean_up(self):
            self.coord.request_stop()
            self.coord.join(self.threads)
    
    
    eval_batcher = DataSupplier(lambda: cifar10.inputs(True))
    
    graph = tf.Graph()
    with graph.as_default():
        images, labels = cifar10.inputs(False)
    
        out_images = tf.identity(images)
        out_labels = tf.identity(labels)
    
    n_runs = 100
    
    with tf.Session(graph=graph) as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess, coord)
        for i in range(n_runs):
            sess.run([out_images, out_labels])
        t = time()
        for i in range(n_runs):
            sess.run([out_images, out_labels])
        dt = (time() - t)/n_runs
        print('Train time: %.3f' % dt)
        t = time()
        for i in range(n_runs):
            eval_images, eval_labels = eval_batcher.get_tensor_val()
            sess.run([out_images, out_labels],
                     feed_dict={images: eval_images, labels: eval_labels})
        dt = (time() - t)/n_runs
        print('Eval time: %.3f' % dt)
        coord.request_stop()
        coord.join(threads)
    
    eval_batcher.clean_up()
    

    Results:

    Train time: 0.050
    Eval time: 0.064
    

    Update: when using this approach in training problems with tf.contrib.layers and regularization, I find the regularization losses go to infinity if the DataSupplier graph is on the same device as the training graph. I cannot for the life of me explain why this is the case, but explicitly setting the device of the DataSupplier to the CPU (given the training graph is on my GPU) seems to work...

提交回复
热议问题