tensorflow-datasets

Interleaving multiple TensorFlow datasets together

﹥>﹥吖頭↗ 提交于 2019-12-05 06:04:14
The current TensorFlow dataset interleave functionality is basically a interleaved flat-map taking as input a single dataset. Given the current API, what's the best way to interleave multiple datasets together? Say they have already been constructed and I have a list of them. I want to produce elements from them alternatively and I want to support lists with more than 2 datasets (i.e., stacked zips and interleaves would be pretty ugly). Thanks! :) @mrry might be able to help. EDIT 2: See tf.contrib.data.choose_from_datasets . It performs deterministic dataset interleaving. EDIT: See tf.contrib

Input multiple files into Tensorflow dataset

吃可爱长大的小学妹 提交于 2019-12-05 02:56:40
问题 I have the following input_fn. def input_fn(filenames, batch_size): # Create a dataset containing the text lines. dataset = tf.data.TextLineDataset(filenames).skip(1) # Parse each line. dataset = dataset.map(_parse_line) # Shuffle, repeat, and batch the examples. dataset = dataset.shuffle(10000).repeat().batch(batch_size) # Return the dataset. return dataset It works great if filenames=['file1.csv'] or filenames=['file2.csv'] . It gives me an error if filenames=['file1.csv', 'file2.csv'] . In

on the fly generation with Dataset api tensorflow

五迷三道 提交于 2019-12-05 02:45:11
问题 I have a function which produces feature and target tensors. E.g. x,t = myfunc() ##x,t tensors How can I integrate this with TensorFlow's dataset API for continuous training? Ideally I would like to use dataset to set things like batch, transformations. Edit for clarification: The problem being I would like to not just put x and t in my graph but make a dataset from them so that I can use the same dataset processing that I have implemented for (normal) finite datasets I can load into memory

TensorFlow tf.data.Dataset and bucketing

风格不统一 提交于 2019-12-05 02:36:23
For an LSTM network, I've seen great improvements with bucketing. I've come across the bucketing section in the TensorFlow docs which (tf.contrib). Though in my network, I am using the tf.data.Dataset API, specifically I'm working with TFRecords, so my input pipeline looks something like this dataset = tf.data.TFRecordDataset(TFRECORDS_PATH) dataset = dataset.map(_parse_function) dataset = dataset.map(_scale_function) dataset = dataset.shuffle(buffer_size=10000) dataset = dataset.padded_batch(batch_size, padded_shapes={.....}) How can I incorporate the bucketing method into a the tf.data

TensorFlow Custom Estimator - Restore model after small changes in model_fn

╄→尐↘猪︶ㄣ 提交于 2019-12-04 20:55:15
问题 I am using tf.estimator.Estimator for developing my model, I wrote a model_fn and trained 50,000 iterations, now I want to make a small change in my model_fn , for example add a new layer. I don't want to start training from scratch, I want to restore all the old variables from the 50,000 checkpoint, and continue training from this point. When I try to do so I get a NotFoundError How can this be done with tf.estimator.Estimator ? 回答1: TL;DR The easiest way to load variables from a previous

tf.data with multiple inputs / outputs in Keras

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-04 17:25:54
问题 For the application, such as pair text similarity , the input data is similar to: pair_1, pair_2 . In these problems, we usually have multiple input data. Previously, I implemented my models successfully: model.fit([pair_1, pair_2], labels, epochs=50) I decided to replace my input pipeline with tf.data API. To this end, I create a Dataset similar to: dataset = tf.data.Dataset.from_tensor_slices((pair_1, pair2, labels)) It compiles successfully but when start to train it throws the following

Neural Machine Translation model predictions are off-by-one

霸气de小男生 提交于 2019-12-04 14:36:20
Problem Summary In the following example, my NMT model has high loss because it correctly predicts target_input instead of target_output . Targetin : 1 3 3 3 3 6 6 6 9 7 7 7 4 4 4 4 4 9 9 10 10 10 3 3 10 10 3 10 3 3 10 10 3 9 9 4 4 4 4 4 3 10 3 3 9 9 3 6 6 6 6 6 6 10 9 9 10 10 4 4 4 4 4 4 4 4 4 4 4 4 9 9 9 9 3 3 3 6 6 6 6 6 9 9 10 3 4 4 4 4 4 4 4 4 4 4 4 4 9 9 10 3 10 9 9 3 4 4 4 4 4 4 4 4 4 10 10 4 4 4 4 4 4 4 4 4 4 9 9 10 3 6 6 6 6 3 3 3 10 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 9 9 3 3 10 6 6 6 6 6 3 9 9 3 3 3 3 3 3 3 10 10 3 9 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 9 3 6 6 6 6 6 6 3 5 3 3 3 3 10 10

Tensorflow: tf.data.Dataset, Cannot batch tensors with different shapes in component 0

主宰稳场 提交于 2019-12-04 13:44:39
I have the following error in my input pipeline: tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [2,48,48,3] and element 1 had shape [27,48,48,3]. with this code dataset = tf.data.Dataset.from_generator(generator, (tf.float32, tf.int64, tf.int64, tf.float32, tf.int64, tf.float32)) dataset = dataset.batch(max_buffer_size) This is completely logical as the batch method tries to create a (batch_size, ?, 48, 48, 3) Tensor. However I want that it creates a [29,48,48,3] Tensor for this case. So

How to cache data during the first epoch correctly (Tensorflow, dataset)?

笑着哭i 提交于 2019-12-04 12:34:00
I'm trying to used the cache transformation for a dataset . Here is my current code (simplified): dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=1) dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=5000, count=1)) dataset = dataset.map(_parser_a, num_parallel_calls=12) dataset = dataset.padded_batch( 20, padded_shapes=padded_shapes, padding_values=padding_values ) dataset = dataset.prefetch(buffer_size=1) dataset = dataset.cache() After the first epoch, I received the following error message: The calling iterator did not fully read the dataset we were

Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?

◇◆丶佛笑我妖孽 提交于 2019-12-04 12:17:32
问题 Hi I am studying the dataset API in tensorflow now and I have a question regarding to the dataset.map() function which performs data preprocessing. file_name = ["image1.jpg", "image2.jpg", ......] im_dataset = tf.data.Dataset.from_tensor_slices(file_names) im_dataset = im_dataset.map(lambda image:tuple(tf.py_func(image_parser(), [image], [tf.float32, tf.float32, tf.float32]))) im_dataset = im_dataset.batch(batch_size) iterator = im_dataset.make_initializable_iterator() The dataset takes in