Number of examples in each tfrecord

不打扰是莪最后的温柔 提交于 2019-12-12 04:47:42

问题


Running the sample.sh script in Google Cloud Shell to call the below preprocess on set of images following the steps of flowers example.

https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/flowers/trainer/preprocess.py

Preprocess was successfully on both eval set and train set. But the generated .tfrecord.gz files does not seem matching the image numbers in eval/train_set.csv.

i.e. eval-00000-of-00157.tfrecord.gz says there are 158 tfrecord while there are 35227 rows in eval_set.csv. Each record include a valid image_url (all of them are uploaded to Storage), each record has valid label tagged.

Would like to know if there is a way to monitor and control the number of images per tfrecord in preproces.py config.

Thanks

Update, got this work out right:

import tensorflow as tf 
import os
from tensorflow.python.lib.io import file_io

options = tf.python_io.TFRecordOptions(
    compression_type=tf.python_io.TFRecordCompressionType.GZIP)

sum(1 for f in file_io.get_matching_files(os.path.join(url/path, '*.tfrecord.gz'))
    for example in tf.python_io.tf_record_iterator(f, options=options))

回答1:


The filename eval-00000-of-00157.tfrecord.gz means that this is the first file out of 158. There should be 157 similarly named files. Within each file, there can be any number of records.

If you want to manually count each record, try something like:

import tensorflow as tf
from tensorflow.python.lib.io import file_io

files = os.path.join('gs://my_bucket/my_dir', 'eval-*.tfrecord.gz')
print(sum(1 for f in tf.python_io.file_io.get_matching_files(files)
            for tf.python_io.tf_record_iterator(f)))

Note that there is no guarantee from Dataflow as to the relationship between the number of files and ordering of records (inter- and intra-file) between input files and output files. However, the counts should be the same.



来源:https://stackoverflow.com/questions/42799007/number-of-examples-in-each-tfrecord

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!