Benchmark of HowTo: Reading Data

后端未结

关注

 3  1329

庸人自扰 2020-12-03 08:41

I\'m using tensorflow 0.10 and I was benchmarking the examples found in the official HowTo on reading data. This HowTo illustrates different methods to move data to tensorf

3条回答

独厮守ぢ (楼主)

2020-12-03 09:39
The main question is that why the example with the preloaded data (constant) examples/how_tos/reading_data/fully_connected_preloaded.py is significantly slower than other data loading example codes when using GPU.

I had the same problem, that fully_connected_preloaded.py is unexpectedly slow on my Titan X. The problem was that the whole dataset was pre-loaded on CPU, not GPU.

First, let me share my initial attempts. I applied the following performance tips by Yaroslav.
- set capacity=55000 for tf.train.slice_input_producer.(55000 is the size of MNIST training set in my case)
- set num_threads=5 for tf.train.batch.
- set capacity=500 for tf.train.batch.
- put time.sleep(10) after tf.train.start_queue_runners.
However, the average speed per each batch stays the same. I tried timeline visualization for profiling, and still got QueueDequeueManyV2 dominating.

The problem was the line 65 of fully_connected_preloaded.py. The following code loads entire dataset to CPU, still providing a bottleneck for CPU-GPU data transmission.
```
with tf.device('/cpu:0'):
    input_images = tf.constant(data_sets.train.images)
    input_labels = tf.constant(data_sets.train.labels)
```
Hence, I switched the device allocation.
```
with tf.device('/gpu:0')
```
Then I got x100 speed-up per each batch.

Note:
1. This was possible because Titan X has enough memory space to preload entire dataset.
2. In the original code(fully_connected_preloaded.py), the comment in the line 64 says "rest of pipeline is CPU-only". I am not sure about what this comment intended.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...