TensorFlow - tf.data.Dataset reading large HDF5 files

后端 未结 2 2079
梦如初夏
梦如初夏 2020-12-24 08:05

I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as

2条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-24 08:17

    I took me a while to figure this out, so I thought I should record this here. Based on mikkola's answer, this is how to handle multiple files:

    import h5py
    import tensorflow as tf
    
    class generator:
        def __call__(self, file):
            with h5py.File(file, 'r') as hf:
                for im in hf["train_img"]:
                    yield im
    
    ds = tf.data.Dataset.from_tensor_slices(filenames)
    ds = ds.interleave(lambda filename: tf.data.Dataset.from_generator(
            generator(), 
            tf.uint8, 
            tf.TensorShape([427,561,3]),
            args=(filename,)),
           cycle_length, block_length)
    

    The key is you can't pass filename directly to generator, since it's a Tensor. You have to pass it through args, which tensorflow evaluates and converts it to a regular python variable.

提交回复
热议问题