Deleting information from an HDF5 file

▼魔方 西西 提交于 2019-12-04 22:58:40

Removing entire nodes (groups or datasets) from a hdf5 file should be no problem.
However if you want to reclaim the space you have to run the h5repack tool.

From the hdf5 docs:

5.5.2. Deleting a Dataset from a File and Reclaiming Space

HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to reclaim the storage space occupied by a deleted object.

Removing a dataset and reclaiming the space it used can be done with the H5Ldelete function and the h5repack utility program. With the H5Ldelete function, links to a dataset can be removed from the file structure. After all the links have been removed, the dataset becomes inaccessible to any application and is effectively removed from the file. The way to recover the space occupied by an unlinked dataset is to write all of the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file. Writing objects to a new file can be done with a custom program or with the h5repack utility program.

Alternatively you can also have a look into PyTables`s ptrepack tool. PyTables should be able to read h5py hdf5 files and the ptrepack tool is similar to the h5repack.

If you want to remove records from a datasets, then you probably have to retrieve the records you want to keep and create a new dataset and remove the old one.
PyTables supports removing rows, however it's not recommended.

If you know that a particular dataset will be removed at the end of an analysis process, why keep it in the master file at all? I would store the temporary data in a separate HDF5 file which could be discarded after the analysis was complete. If it's important to link the temporary dataset inside the master file, just create an external link between the master and the temp using H5Lcreate_external(). External links consume a trivial amount of space.

In HDF5 1.10 and above, there is a mechanism of file space management. It can be implemented by specifying fcpl(File Creation Property List) in H5F.create.

One important change that you would notice is that file after your first import would be a little bigger(in Kb) in the first import. But after that, your file size would eventually be smaller (after the reclaim process).

You can monitor the free space in your HDF5 files by using h5stat tool

h5stat -S filename
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!