tensorflow-datasets

Restoring a model trained with tf.estimator and feeding input through feed_dict

情到浓时终转凉″ 提交于 2019-12-01 23:14:05
问题 I trained a resnet with tf.estimator, the model was saved during the training process. The saved files consist of .data , .index and .meta . I'd like to load this model back and get predictions for new images. The data was fed to the model during training using tf.data.Dataset . I have closely followed the resnet implementation given here. I would like to restore the model and feed inputs to the nodes using a feed_dict. First attempt #rebuild input pipeline images, labels = input_fn(data_dir,

tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?

。_饼干妹妹 提交于 2019-12-01 16:00:52
Let's say I have defined a dataset in this way: filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset)) how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an epoch)? I know that tf.data.Dataset already knows the dimension of the dataset, because the repeat() method allows repeating the input pipeline for a specified number of epochs. So it must be a way to get this information. tf.data.Dataset.list_files creates a tensor called MatchingFiles:0 (with the appropriate prefix if applicable). You could evaluate tf

tf.contrib.data.DataSet batch size can only set to 1

こ雲淡風輕ζ 提交于 2019-12-01 13:24:44
问题 I converted pascal voc dataset to tfrecords via code create_pascal_tf_record.py. I used tf.contrib.data.Dataset to read data. I used code as follows: import tensorflow as tf from tensorflow.contrib.data import Iterator slim_example_decoder = tf.contrib.slim.tfexample_decoder flags = tf.app.flags flags.DEFINE_string('data_dir', '/home/aurora/workspaces/data/tfrecords_data/voc_dataset/trainval.tfrecords', 'tfrecords file output path') flags.DEFINE_integer('batch_size', 1, 'training batch size')

Best way to process terabytes of data on gcloud ml-engine with keras

送分小仙女□ 提交于 2019-12-01 10:59:37
I want to train a model on about 2TB of image data on gcloud storage. I saved the image data as separate tfrecords and tried to use the tensorflow data api following this example https://medium.com/@moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36 But it seems like keras' model.fit(...) doesn't support validation for tfrecord datasets based on https://github.com/keras-team/keras/pull/8388 Is there a better approach for processing large amounts of data with keras from ml-engine that I'm missing? Thanks a lot! If you are willing to use tf.keras instead of actual Keras, you can

Best way to process terabytes of data on gcloud ml-engine with keras

别等时光非礼了梦想. 提交于 2019-12-01 09:22:43
问题 I want to train a model on about 2TB of image data on gcloud storage. I saved the image data as separate tfrecords and tried to use the tensorflow data api following this example https://medium.com/@moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36 But it seems like keras' model.fit(...) doesn't support validation for tfrecord datasets based on https://github.com/keras-team/keras/pull/8388 Is there a better approach for processing large amounts of data with keras from ml

Parallelize tf.from_generator using tf.contrib.data.parallel_interleave

≯℡__Kan透↙ 提交于 2019-12-01 06:45:54
I have a bunch of JSON array files (AVRO to be accurate) and each of them yield multiple samples for training a Keras Model. Using ideas from @GPhilo and from @jsimsa , I was able to come up with this to parallelize my input pipeline. Unable to figure out how to design the generator(n) to divide the work of processing files. The code fails inside parse_file(f) as the function expects a string file path and not a Tensor , N = num_cores = 2 files_to_process = ["f1.avro", "f2.avro", "f3.avro"] shuffle_size = prefetch_buffer = 1000 batch_size = 512 def generator(n): size = math.ceil(len(files_to

Parallelize tf.from_generator using tf.contrib.data.parallel_interleave

穿精又带淫゛_ 提交于 2019-12-01 05:57:40
问题 I have a bunch of JSON array files (AVRO to be accurate) and each of them yield multiple samples for training a Keras Model. Using ideas from @GPhilo and from @jsimsa, I was able to come up with this to parallelize my input pipeline. Unable to figure out how to design the generator(n) to divide the work of processing files. The code fails inside parse_file(f) as the function expects a string file path and not a Tensor , N = num_cores = 2 files_to_process = ["f1.avro", "f2.avro", "f3.avro"]

Tensorflow Dataset.from_generator fails with pyfunc exception

半腔热情 提交于 2019-12-01 04:44:20
问题 I am trying tensorflow's nightly 1.4 as I need Dataset.from_generator to stich together some variable length datasets. This simple code (idea from here): import tensorflow as tf Dataset = tf.contrib.data.Dataset it2 = Dataset.range(5).make_one_shot_iterator() def _dataset_generator(): while True: try: try: get_next = it2.get_next() yield get_next except tf.errors.OutOfRangeError: continue except tf.errors.OutOfRangeError: return # Dataset.from_generator need tensorflow > 1.3 ! das_dataset =

How to use tf.data's initializable iterators within a tf.estimator's input_fn?

大憨熊 提交于 2019-12-01 03:18:56
I would like to manage my training with a tf.estimator.Estimator but have some trouble to use it alongside the tf.data API. I have something like this: def model_fn(features, labels, params, mode): # Defines model's ops. # Initializes with tf.train.Scaffold. # Returns an tf.estimator.EstimatorSpec. def input_fn(): dataset = tf.data.TextLineDataset("test.txt") # map, shuffle, padded_batch, etc. iterator = dataset.make_initializable_iterator() return iterator.get_next() estimator = tf.estimator.Estimator(model_fn) estimator.train(input_fn) As I can't use a make_one_shot_iterator for my use case,

tensorflow dataset shuffle then batch or batch then shuffle

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 02:19:35
问题 I recently began learning tensorflow. I am unsure about whether there is a difference x = np.array([[1],[2],[3],[4],[5]]) dataset = tf.data.Dataset.from_tensor_slices(x) ds.shuffle(buffer_size=4) ds.batch(4) and x = np.array([[1],[2],[3],[4],[5]]) dataset = tf.data.Dataset.from_tensor_slices(x) ds.batch(4) ds.shuffle(buffer_size=4) Also, I am not sure why I cannot use dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE) as it gives the error dataset = dataset.shuffle_batch