TensorFlow - tf.data.Dataset reading large HDF5 files
I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as a collection of compressed JPG images (to make size on disk manageable). Using tf.data.Dataset and a map to tf.py_func , reading examples from the HDF5 file using custom Python logic is quite easy. For example: def read_examples_hdf5(filename, label): with h5py.File(filename, 'r') as hf: # read frames from HDF5 and decode them from JPG return frames, label filenames = glob.glob(os.path.join(hdf5_data_path, "*.h5")) labels