How to input a list of lists with different sizes in tf.data.Dataset

可紊 提交于 2019-11-27 20:14:36

You can use tf.data.Dataset.from_generator() to convert any iterable Python object (like a list of lists) into a Dataset:

t = [[4, 2], [3, 4, 5]]

dataset = tf.data.Dataset.from_generator(lambda: t, tf.int32, output_shapes=[None])

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
  print(sess.run(next_element))  # ==> '[4, 2]'
  print(sess.run(next_element))  # ==> '[3, 4, 5]'

I don't think tensorflow supports tensors with varying numbers of elements along a given dimension.

However, a simple solution is to pad the nested lists with trailing zeros (where necessary):

t = [[4,2], [3,4,5]]
max_length = max(len(lst) for lst in t)
t_pad = [lst + [0] * (max_length - len(lst)) for lst in t]
print(t_pad)
dataset = tf.data.Dataset.from_tensor_slices(t_pad)
print(dataset)

Outputs:

[[4, 2, 0], [3, 4, 5]]
<TensorSliceDataset shapes: (3,), types: tf.int32>

The zeros shouldn't be a big problem for the model: semantically they're just extra sentences of size zero at the end of each list of actual sentences.

In addition to @mrry's answer, the following code is also possible if you would like to create (images, labels) pair:

import itertools
data = tf.data.Dataset.from_generator(lambda: itertools.izip_longest(images, labels),
                                      output_types=(tf.float32, tf.float32),
                                      output_shapes=(tf.TensorShape([None, None, 3]), 
                                                     tf.TensorShape([None])))

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    image, label = sess.run(next_element)  # ==> shape: [320, 420, 3], [20]
    image, label = sess.run(next_element)  # ==> shape: [1280, 720, 3], [40]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!