h5py

HDF5 file (h5py) with version control - hash changes on every save

自作多情 提交于 2021-02-07 13:49:39
问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

HDF5 file (h5py) with version control - hash changes on every save

故事扮演 提交于 2021-02-07 13:49:06
问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

save multiple pd.DataFrames with hierarchy to hdf5

◇◆丶佛笑我妖孽 提交于 2021-02-07 09:48:30
问题 I have multiple pd.DataFrames which have hierarchical organization. Let's say I have: day_temperature_london_df = pd.DataFrame(...) night_temperature_london_df = pd.DataFrame(...) day_temperature_paris_df = pd.DataFrame(...) night_temperature_paris_df = pd.DataFrame(...) And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'. If I use h5py I lose the format of the pd.DataFrame , lose indexes and columns. f = h5py.File("temperature.h5", "w")

compressed files bigger in h5py

蓝咒 提交于 2021-02-07 07:29:57
问题 I'm using h5py to save numpy arrays in HDF5 format from python. Recently, I tried to apply compression and the size of the files I get is bigger... I went from things (every file has several datasets) like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos) to things like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos, compression="gzip",

reading nested .h5 group into numpy array

戏子无情 提交于 2021-02-06 11:53:53
问题 I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

◇◆丶佛笑我妖孽 提交于 2021-02-05 09:13:24
问题 My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on. After the training, I am trying to save the model but I get the following error: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py\h5o.pyx", line 202, in h5py.h5o.link OSError

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

霸气de小男生 提交于 2021-02-05 09:12:07
问题 My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on. After the training, I am trying to save the model but I get the following error: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py\h5o.pyx", line 202, in h5py.h5o.link OSError

I want to convert very large csv data to hdf5 in python

◇◆丶佛笑我妖孽 提交于 2021-01-29 17:37:12
问题 I have a very large csv data. It looks like this. [Date, Firm name, value 1, value 2, ..., value 60] I want to convert this to a hdf5 file. For example, let's say I have two dates (2019-07-01, 2019-07-02), each date has 3 firms (firm 1, firm 2, firm 3) and each firm has [value 1, value 2, ... value 60]. I want to use date and firm name as a group. Specifically, I want this hierarchy: 'Date/Firm name'. For example, 2019-07-01 has firm 1, firm 2, and firm 3. When you look at each firm, there

How in python 3.6 to get data array from hdf5 file if dtype is “<u4”?

会有一股神秘感。 提交于 2021-01-29 16:21:29
问题 I want to get dataset with format {N, 16, 512, 128} as 4D numpy array from hdf5 file. N is a number of 3D arrays with {16, 512, 128} format. I try to do this: import os import sys import h5py as h5 import numpy as np import subprocess import re file_name = sys.argv[1] path = sys.argv[2] f = h5.File(file_name, 'r') data = f[path] print(data.shape) #{27270, 16, 512, 128} print(data.dtype) #"<u4" data = np.array(data, dtype=np.uint32) print(data.shape) Unfortunately, after data = np.array(data,

Python: Can I write to a file without loading its contents in RAM?

回眸只為那壹抹淺笑 提交于 2021-01-29 08:03:56
问题 Got a big data-set that I want to shuffle. The entire set won't fit into RAM so it would be good if I could open several files (e.g. hdf5, numpy) simultaneously, loop through my data chronologically and randomly assign each data-point to one of the piles (then afterwards shuffle each pile). I'm really inexperienced with working with data in python so I'm not sure if it's possible to write to files without holding the rest of its contents in RAM (been using np.save and savez with little