I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as
I took me a while to figure this out, so I thought I should record this here. Based on mikkola's answer, this is how to handle multiple files:
import h5py
import tensorflow as tf
class generator:
def __call__(self, file):
with h5py.File(file, 'r') as hf:
for im in hf["train_img"]:
yield im
ds = tf.data.Dataset.from_tensor_slices(filenames)
ds = ds.interleave(lambda filename: tf.data.Dataset.from_generator(
generator(),
tf.uint8,
tf.TensorShape([427,561,3]),
args=(filename,)),
cycle_length, block_length)
The key is you can't pass filename directly to generator, since it's a Tensor. You have to pass it through args, which tensorflow evaluates and converts it to a regular python variable.