Accessing epoch value across multiple threads using input_producer/limit_epochs/epochs:0 local variable

匿名 (未验证) 提交于 2019-12-03 02:38:01

问题:

I tried to extract the current epoch number while reading data using multiple cpu threads. However during a trial code I observed an output which did not make any sense. Consider the code below :

with tf.Session() as sess:         train_filename_queue = tf.train.string_input_producer(trainimgs, num_epochs=4, shuffle=True)         value = train_filename_queue.dequeue()         init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())         sess.run(init_op)         coord = tf.train.Coordinator()         tf.train.start_queue_runners(coord=coord)         collections = [v.name for v in tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES,\                                                          scope='input_producer/limit_epochs/epochs:0')]         print(collections)          threads = [threading.Thread(target=work, args=(coord, value, sess, collections)) for i in \                    range(20)]         for t in threads:             t.start()         coord.join(threads)         coord.request_stop() 

The work function is defined as below :

def work(coord, val, sess, collections):     counter = 0     while not coord.should_stop():         try:             epoch = sess.run(collections[0])             filename = sess.run(val).decode(encoding='UTF-8')             print(filename + ' ' + str(epoch))         except tf.errors.OutOfRangeError:             coord.request_stop()     return None 

The output I obtain is the following :

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076 pciBusID 0000:84:00.0 Total memory: 11.92GiB Free memory: 11.80GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0) I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined> I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): GeForce GTX TITAN X, Compute Capability 5.2 ['input_producer/limit_epochs/epochs:0'] /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 2 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 /local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 

The last number in each line corresponds to the value of input_producer/limit_epochs/epochs:0' local variable.

For a first trial, I kept only 10 images in the queue meaning I should get a total of 40 lines of output, which I get.

  • However, I should get equal number of 1,2, 3 and 4 as the last character in each line, since each filename should be extracted in each of the 4 epochs.

Why am I getting the same number 4 in all the lines ?

Further Information

  • I tried using range(1) (for a single thread), and still the same observation.
  • Don't bother with the digit '0'. It is simply the label of the corresponding file. I saved the image file names in such a way.

回答1:

I did a lot of experiments and finally concluded the following :

I used to believe that -

tf.train.string_input_producer() enqueues a queue epoch-wise.
Meaning that, first one complete epoch is enqueued (in multiple
stages if capacity is less than the number of filenames) and then
further epochs are enqueued.

It is not really the case.

When tf.start_queue_runners() is executed, all the epochs are enqueued together (in multiple stages if capacity is less than number of filenames). The local variable epochs:0 is used by tf.train.string_input_producer to maintain the epoch that is being enqueued. Once epochs:0 reaches num_epochs, it remains constant and no matter how many threads are dequeuing from the queue, it does not change.

When you capture the value of epochs:0 it gives you the instantaneous value of the counter epochs and it tells you that at that time which epoch of the dataset is being enqueued. It does not tell you that which epoch of the dataset are you dequeuing.

So it is a bad idea to get the value of the current epoch from the epochs:0 local_variable.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!