Benchmark of HowTo: Reading Data

后端 未结 3 1329
庸人自扰
庸人自扰 2020-12-03 08:41

I\'m using tensorflow 0.10 and I was benchmarking the examples found in the official HowTo on reading data. This HowTo illustrates different methods to move data to tensorf

3条回答
  •  独厮守ぢ
    2020-12-03 09:39

    The main question is that why the example with the preloaded data (constant) examples/how_tos/reading_data/fully_connected_preloaded.py is significantly slower than other data loading example codes when using GPU.

    I had the same problem, that fully_connected_preloaded.py is unexpectedly slow on my Titan X. The problem was that the whole dataset was pre-loaded on CPU, not GPU.

    First, let me share my initial attempts. I applied the following performance tips by Yaroslav.

    • set capacity=55000 for tf.train.slice_input_producer.(55000 is the size of MNIST training set in my case)
    • set num_threads=5 for tf.train.batch.
    • set capacity=500 for tf.train.batch.
    • put time.sleep(10) after tf.train.start_queue_runners.

    However, the average speed per each batch stays the same. I tried timeline visualization for profiling, and still got QueueDequeueManyV2 dominating.

    The problem was the line 65 of fully_connected_preloaded.py. The following code loads entire dataset to CPU, still providing a bottleneck for CPU-GPU data transmission.

    with tf.device('/cpu:0'):
        input_images = tf.constant(data_sets.train.images)
        input_labels = tf.constant(data_sets.train.labels)
    

    Hence, I switched the device allocation.

    with tf.device('/gpu:0')
    

    Then I got x100 speed-up per each batch.

    Note:

    1. This was possible because Titan X has enough memory space to preload entire dataset.
    2. In the original code(fully_connected_preloaded.py), the comment in the line 64 says "rest of pipeline is CPU-only". I am not sure about what this comment intended.

提交回复
热议问题