tensorflow-datasets | 易学教程

Interleaving multiple TensorFlow datasets together

阅读更多关于 Interleaving multiple TensorFlow datasets together

The current TensorFlow dataset interleave functionality is basically a interleaved flat-map taking as input a single dataset. Given the current API, what's the best way to interleave multiple datasets together? Say they have already been constructed and I have a list of them. I want to produce elements from them alternatively and I want to support lists with more than 2 datasets (i.e., stacked zips and interleaves would be pretty ugly). Thanks! :) @mrry might be able to help. EDIT 2: See tf.contrib.data.choose_from_datasets . It performs deterministic dataset interleaving. EDIT: See tf.contrib

Input multiple files into Tensorflow dataset

阅读更多关于 Input multiple files into Tensorflow dataset

问题 I have the following input_fn. def input_fn(filenames, batch_size): # Create a dataset containing the text lines. dataset = tf.data.TextLineDataset(filenames).skip(1) # Parse each line. dataset = dataset.map(_parse_line) # Shuffle, repeat, and batch the examples. dataset = dataset.shuffle(10000).repeat().batch(batch_size) # Return the dataset. return dataset It works great if filenames=['file1.csv'] or filenames=['file2.csv'] . It gives me an error if filenames=['file1.csv', 'file2.csv'] . In

on the fly generation with Dataset api tensorflow

阅读更多关于 on the fly generation with Dataset api tensorflow

问题 I have a function which produces feature and target tensors. E.g. x,t = myfunc() ##x,t tensors How can I integrate this with TensorFlow's dataset API for continuous training? Ideally I would like to use dataset to set things like batch, transformations. Edit for clarification: The problem being I would like to not just put x and t in my graph but make a dataset from them so that I can use the same dataset processing that I have implemented for (normal) finite datasets I can load into memory

TensorFlow tf.data.Dataset and bucketing

阅读更多关于 TensorFlow tf.data.Dataset and bucketing

For an LSTM network, I've seen great improvements with bucketing. I've come across the bucketing section in the TensorFlow docs which (tf.contrib). Though in my network, I am using the tf.data.Dataset API, specifically I'm working with TFRecords, so my input pipeline looks something like this dataset = tf.data.TFRecordDataset(TFRECORDS_PATH) dataset = dataset.map(_parse_function) dataset = dataset.map(_scale_function) dataset = dataset.shuffle(buffer_size=10000) dataset = dataset.padded_batch(batch_size, padded_shapes={.....}) How can I incorporate the bucketing method into a the tf.data

TensorFlow Custom Estimator - Restore model after small changes in model_fn

阅读更多关于 TensorFlow Custom Estimator - Restore model after small changes in model_fn

问题 I am using tf.estimator.Estimator for developing my model, I wrote a model_fn and trained 50,000 iterations, now I want to make a small change in my model_fn , for example add a new layer. I don't want to start training from scratch, I want to restore all the old variables from the 50,000 checkpoint, and continue training from this point. When I try to do so I get a NotFoundError How can this be done with tf.estimator.Estimator ? 回答1: TL;DR The easiest way to load variables from a previous

tf.data with multiple inputs / outputs in Keras

阅读更多关于 tf.data with multiple inputs / outputs in Keras

问题 For the application, such as pair text similarity , the input data is similar to: pair_1, pair_2 . In these problems, we usually have multiple input data. Previously, I implemented my models successfully: model.fit([pair_1, pair_2], labels, epochs=50) I decided to replace my input pipeline with tf.data API. To this end, I create a Dataset similar to: dataset = tf.data.Dataset.from_tensor_slices((pair_1, pair2, labels)) It compiles successfully but when start to train it throws the following

Neural Machine Translation model predictions are off-by-one

阅读更多关于 Neural Machine Translation model predictions are off-by-one

Problem Summary In the following example, my NMT model has high loss because it correctly predicts target_input instead of target_output . Targetin : 1 3 3 3 3 6 6 6 9 7 7 7 4 4 4 4 4 9 9 10 10 10 3 3 10 10 3 10 3 3 10 10 3 9 9 4 4 4 4 4 3 10 3 3 9 9 3 6 6 6 6 6 6 10 9 9 10 10 4 4 4 4 4 4 4 4 4 4 4 4 9 9 9 9 3 3 3 6 6 6 6 6 9 9 10 3 4 4 4 4 4 4 4 4 4 4 4 4 9 9 10 3 10 9 9 3 4 4 4 4 4 4 4 4 4 10 10 4 4 4 4 4 4 4 4 4 4 9 9 10 3 6 6 6 6 3 3 3 10 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 9 9 3 3 10 6 6 6 6 6 3 9 9 3 3 3 3 3 3 3 10 10 3 9 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 9 3 6 6 6 6 6 6 3 5 3 3 3 3 10 10

Tensorflow: tf.data.Dataset, Cannot batch tensors with different shapes in component 0

阅读更多关于 Tensorflow: tf.data.Dataset, Cannot batch tensors with different shapes in component 0

I have the following error in my input pipeline: tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [2,48,48,3] and element 1 had shape [27,48,48,3]. with this code dataset = tf.data.Dataset.from_generator(generator, (tf.float32, tf.int64, tf.int64, tf.float32, tf.int64, tf.float32)) dataset = dataset.batch(max_buffer_size) This is completely logical as the batch method tries to create a (batch_size, ?, 48, 48, 3) Tensor. However I want that it creates a [29,48,48,3] Tensor for this case. So

How to cache data during the first epoch correctly (Tensorflow, dataset)?

阅读更多关于 How to cache data during the first epoch correctly (Tensorflow, dataset)?

I'm trying to used the cache transformation for a dataset . Here is my current code (simplified): dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=1) dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=5000, count=1)) dataset = dataset.map(_parser_a, num_parallel_calls=12) dataset = dataset.padded_batch( 20, padded_shapes=padded_shapes, padding_values=padding_values ) dataset = dataset.prefetch(buffer_size=1) dataset = dataset.cache() After the first epoch, I received the following error message: The calling iterator did not fully read the dataset we were

Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?

阅读更多关于 Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?

问题 Hi I am studying the dataset API in tensorflow now and I have a question regarding to the dataset.map() function which performs data preprocessing. file_name = ["image1.jpg", "image2.jpg", ......] im_dataset = tf.data.Dataset.from_tensor_slices(file_names) im_dataset = im_dataset.map(lambda image:tuple(tf.py_func(image_parser(), [image], [tf.float32, tf.float32, tf.float32]))) im_dataset = im_dataset.batch(batch_size) iterator = im_dataset.make_initializable_iterator() The dataset takes in