Corrupt files when creating HDF5 files without closing them (h5py)

丶灬走出姿态 提交于 2020-02-14 05:50:08

问题


I am using h5py to store experiment data in an HDF5 container.

In an interactive session I open the file using:

measurement_data = h5py.File('example.hdf5', 'a')

Then I write data to the file using some self-written functions (can be many GB of data from a couple of days experiment). At the end of the experiment I usually would close the file using

measurement_data.close()

Unfortunately, from time to time it happens, that the interactive session ends without me explicitly closing the file (accidentally killing the session, power outage, crash of OS due to some other software). This always results in a corrupt file and loss of the complete data. When I try to open it, I get the error:

OSError: Unable to open file (File signature not found)

I also cannot open the file in HDFview, or any other software I tried.

  1. Is there a way to avoid a corrupt file even if it is not closed explicitly? I've read about using the with statement here, but I'm not sure if this would help, when the session unexpectedly ends.
  2. Can I restore the data in the corrupt files in some way? Is there a repair program available?

Always opening and closing the file for every write access sounds pretty unfavorable to me, because I am continuously writing data from many different functions and threads. So I'd be more happy with a different solution.


回答1:


The corruption problem is known to the HDF5 designers. They are working on fixing this in version 1.10 by adding journalling. In the mean time you can call flush() periodically to make sure your writes have been flushed, which should minimise some of the damage. You can also try to use external links which will allow you to store pieces of data in separate files but link them together into one structure when you read them.




回答2:


Nothing will prevent a file from being corrupted in the event of e.g. a power outage. All you can do is minimize damage. One way to do this is using redundancy. You use two files instead of one, and only one of those is opened at any time. Say file 1 is opened, you write all your changes to file 1. After a certain amount of time or a certain amount of data written, close it, update file 2 from file one, and continue writing to file 2 and so on alternating.



来源:https://stackoverflow.com/questions/31287744/corrupt-files-when-creating-hdf5-files-without-closing-them-h5py

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!