In Tensorflow's Dataset API how do you map one element into multiple elements?

∥☆過路亽.° 提交于 2019-11-28 02:59:17

问题


In the tensorflow Dataset pipeline I'd like to define a custom map function which takes a single input element (data sample) and returns multiple elements (data samples).

The code below is my attempt, along with the desired results.

I could not follow the documentation on tf.data.Dataset().flat_map() well enough to understand if it was applicable here or not.

import tensorflow as tf

input = [10, 20, 30]

def my_map_func(i):
  return [[i, i+1, i+2]]       # Fyi [[i], [i+1], [i+2]] throws an exception

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
  func=my_map_func, inp=[input], Tout=[tf.int64]
))
element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))

Results:

(array([10, 11, 12]),)
(array([20, 21, 22]),)
(array([30, 31, 32]),)

Desired results:

(10)
(11)
(12)
(20)
(21)
(22)
(30)
(31)
(32)

回答1:


Two more steps were required to achieve this. First, the map function needs to return a numpy array, not a list.

Then you can use flat_map combined with Dataset().from_tensor_slices() to flatten them. The code below now produces the desired result:

Tested in Tensorflow 1.5 (copy/paste runnable example)

import tensorflow as tf
import numpy as np

input = [10, 20, 30]

def my_map_func(i):
  return np.array([i, i + 1, i + 2])

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
  func=my_map_func, inp=[input], Tout=[tf.int64]
))
ds = ds.flat_map(lambda x: tf.data.Dataset().from_tensor_slices(x))

element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))

Here is a method of doing this if you have multiple variables to return, in this example I input a string (such as a filename) and output multiples of both strings and integers. In this case I repeat the string for each of the integers of [10, 20, 30].

Copy/paste runnable example:

import tensorflow as tf
import numpy as np

input = [b'testA', b'testB', b'testC']

def my_map_func(input):
  return np.array([input, input, input]), np.array([10, 20, 30])

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
    func=my_map_func, inp=[input], Tout=[tf.string, tf.int64]))
ds = ds.flat_map(lambda mystr, myint: tf.data.Dataset().zip((
  tf.data.Dataset().from_tensor_slices(mystr),
  tf.data.Dataset().from_tensor_slices(myint))
))

element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))



回答2:


one clean solution using flat_map and from_tensor_slices

import tensorflow as tf

input = [10, 20, 30]

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.flat_map(lambda x: tf.data.Dataset.from_tensor_slices([x, x+1, x+2]))
element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
    for _ in range(9):
        print(sess.run(element))

# 10
# 11
# 12
# 20
# 21
# 22
# 30
# 31
# 32



回答3:


Just wanted to add that this can be done for datasets where each element is a dictionary as well. For example, if one element of the input dataset looks like

{ 'feat1': [2,4], 'feat2': [3]}

And for each element you want to split into to elements based on the elements in feat1, you could write:

def split(element):
    dict_of_new_elements = {
        'feat1': [
            element['feat1'][:, 0],
            element['feat1'][:, 1]]
        'feat2': [
            element['feat2'],
            element['feat2']]
    }
    return tf.data.Dataset.from_tensor_slices(dict_of_new_elements)
dataset.flat_map(split)

Which would yield

[
    {'feat1': 2, 'feat2': 3},
    {'feat1': 4, 'feat2': 3},
]


来源:https://stackoverflow.com/questions/48471926/in-tensorflows-dataset-api-how-do-you-map-one-element-into-multiple-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!