How to input a list of lists with different sizes in tf.data.Dataset

前端 未结 4 1422
一整个雨季
一整个雨季 2020-12-03 01:49

I have a long list of lists of integers (representing sentences, each one of different sizes) that I want to feed using the tf.data library. Each list (of the lists of list)

4条回答
  •  甜味超标
    2020-12-03 02:15

    I don't think tensorflow supports tensors with varying numbers of elements along a given dimension.

    However, a simple solution is to pad the nested lists with trailing zeros (where necessary):

    t = [[4,2], [3,4,5]]
    max_length = max(len(lst) for lst in t)
    t_pad = [lst + [0] * (max_length - len(lst)) for lst in t]
    print(t_pad)
    dataset = tf.data.Dataset.from_tensor_slices(t_pad)
    print(dataset)
    

    Outputs:

    [[4, 2, 0], [3, 4, 5]]
    
    

    The zeros shouldn't be a big problem for the model: semantically they're just extra sentences of size zero at the end of each list of actual sentences.

提交回复
热议问题