How can I modifya sequencial data using map or filter or reduce method for tf.data.Dataset objects?

牧云@^-^@ 提交于 2020-11-25 04:05:20

问题


I have a python data generator-

import numpy as np
import tensorflow as tf

vocab_size = 5
def create_generator():
    'generates sequences of varying lengths(5 to 7) with random number from 0 to voca_size-1'
    count = 0
    while count < 5:
        sequence_len = np.random.randint(5, 8) # length varies from 5 to 7
        seq = np.random.randint(0, vocab_size, (sequence_len))
        yield seq
        count +=1

gen = tf.data.Dataset.from_generator(create_generator, 
                             args=[], 
                             output_types=tf.int32, 
                             output_shapes = (None, ), )

for g in gen:
    print(g)

It generates sequences of varying lengths (5 to 8) with integer values from 0 to 4. Here are some of the sequences generated by the generator-

tf.Tensor([4 0 0 1 4 1], shape=(7,), dtype=int32) # 1st sequence
tf.Tensor([3 4 4 4 0], shape=(5,), dtype=int32)   # 2nd sequence
tf.Tensor([4 4 2 1 4 3], shape=(5,), dtype=int32) # 3rd sequence
tf.Tensor([1 0 2 4 0], shape=(7,), dtype=int32)   # 4th sequence
tf.Tensor([1 4 0 2 2], shape=(6,), dtype=int32)   # 5th sequence

Now I want to modify the sequences in such a way that-

  • all the even numbers are removed from each sequence
  • sequences(after removing all the even numbers) with lengths < 2 are filtered out

This should give us a result which looks like this-

[1 1] # 1st sequence
[1 3] # 3rd sequence

How can I do such transformations using tf.data.Dataset methods?


回答1:


Your for loop should look like:

new_gen = []
for g in gen:
    arr = np.array(g) % 2 != 0: 
    if len(list(arr)) >= 2:
        new_gen.append(arr)

print(new_gen)


来源:https://stackoverflow.com/questions/64870310/how-can-i-modifya-sequencial-data-using-map-or-filter-or-reduce-method-for-tf-da

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!