h5py

Why do pickle + gzip outperform h5py on repetitive datasets?

点点圈 提交于 2020-01-05 03:04:08
问题 I am saving a numpy array which contains repetitive data: import numpy as np import gzip import cPickle as pkl import h5py a = np.random.randn(100000, 10) b = np.hstack( [a[cnt:a.shape[0]-10+cnt+1] for cnt in range(10)] ) f_pkl_gz = gzip.open('noise.pkl.gz', 'w') pkl.dump(b, f_pkl_gz, protocol = pkl.HIGHEST_PROTOCOL) f_pkl_gz.close() f_pkl = open('noise.pkl', 'w') pkl.dump(b, f_pkl, protocol = pkl.HIGHEST_PROTOCOL) f_pkl.close() f_hdf5 = h5py.File('noise.hdf5', 'w') f_hdf5.create_dataset('b',

How to write a Pandas Dataframe into a HDF5 dataset

社会主义新天地 提交于 2020-01-02 04:24:45
问题 I'm trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I'd like to keep it as a single file which will grow in the future on a daily basis. I've had a go with the following code, which shows the structure of what I'd like to achieve import h5py import numpy as np import pandas as pd file = h5py.File('database.h5','w') d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=[

How do I traverse a hdf5 file using h5py

爷,独闯天下 提交于 2020-01-01 02:52:12
问题 How do I traverse all the groups and datasets of an hdf5 file using h5py? I want to retrieve all the contents of the file from a common root using a for loop or something similar. 回答1: visit() and visititems() are your friends here. Cf. http://docs.h5py.org/en/latest/high/group.html#Group.visit. Note that an h5py.File is also an h5py.Group . Example (not tested): def visitor_func(name, node): if isinstance(node, h5py.Dataset): # node is a dataset else: # node is a group with h5py.File('myfile

What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

北慕城南 提交于 2019-12-31 13:29:34
问题 I have read several times that turning on compression in HDF5 can lead to better read/write performance. I wonder what ideal settings can be to achieve good read/write performance at: data_df.to_hdf(..., format='fixed', complib=..., complevel=..., chunksize=...) I'm already using fixed format (i.e. h5py ) as it's faster than table . I have strong processors and do not care much about disk space. I often store DataFrame s of float64 and str types in files of approx. 2500 rows x 9000 columns.

How to feed .h5 files in tf.data pipeline in tensorflow model

岁酱吖の 提交于 2019-12-31 05:33:08
问题 I'm trying to optimize the input pipeline for .h5 data with tf.data. But I encountered a TypeError: expected str, bytes or os.PathLike object, not Tensor . I did a research but can't find anything about converting a tensor of string to string. This simplified code is executable and return the same error: batch_size = 1000 conv_size = 3 nb_conv = 32 learning_rate = 0.0001 # define parser function def parse_function(fname): with h5py.File(fname, 'r') as f: #Error comes from here X = f['X']

Fastest way to write HDF5 files with Python?

那年仲夏 提交于 2019-12-29 14:16:06
问题 Given a large (10s of GB) CSV file of mixed text/numbers, what is the fastest way to create an HDF5 file with the same content, while keeping the memory usage reasonable? I'd like to use the h5py module if possible. In the toy example below, I've found an incredibly slow and incredibly fast way to write data to HDF5. Would it be best practice to write to HDF5 in chunks of 10,000 rows or so? Or is there a better way to write a massive amount of data to such a file? import h5py n = 10000000 f =

Visible Deprecation warning…?

余生颓废 提交于 2019-12-29 07:38:30
问题 I have some data that Im reading from a h5 file as a numpy array and am doing some analysis with. For context, the data plots a spectral response curve. I am indexing the data (and a subsequent array I have made for my x axis) to get a specific value or range of values. Im not doing anything complex and even the little maths I'm doing is pretty basic. However I get the following warning error in a number of places "VisibleDeprecationWarning: boolean index did not match indexed array along

How to edit h5 files with h5py?

倖福魔咒の 提交于 2019-12-25 11:40:16
问题 The question on overwrite array using h5py did not solve my problem. I want to edit the array values of a VGG16 model. f = h5py.File('C:/Users/yash/.keras/models/vgg16_weights_tf_dim_ordering_tf_kernels_2.h5', mode = 'a') ab = list(h5py.AttributeManager.keys(f)) print(list(f.attrs.keys())) print(ab) The code above returns: ['layer_names'] ['block1_conv1', 'block1_conv2', 'block1_pool', 'block2_conv1', 'block2_conv2', 'block2_pool', 'block3_conv1', 'block3_conv2', 'block3_conv3', 'block3_pool'

h5py: chunking on resizable dataset

雨燕双飞 提交于 2019-12-25 09:00:09
问题 I have a series of raster datasets which I want to combine into a single HDF5 file. Each raster file will be converted into an array with the dimensions 3600 x 7000 . As I have a total of 659 files, the final array would have a shape of 3600 x 7000 x 659 , too big for my (huge) amount of RAM. I'm fairly new to python and HDF5 itself, but basically my approach is to create a dataset with the required 2-d dimensions and then iteratively read the files into arrays and append to the dataset. I'm

Colormap issue using animation in matplotlib

你离开我真会死。 提交于 2019-12-25 04:09:03
问题 I use matplotlib.animation to animate data in a 3D array named arr . I read data from a h5 file using h5py library and everything is OK. But when using animation, the colormap got stuck in first frame of the data range, and after some steps it shows unnormalized colors while plotting. Here is my code: import numpy as np import h5py import matplotlib.pyplot as plt import matplotlib.animation as animation import matplotlib.cm as cm f = h5py.File('ez.h5','r') arr = f["ez"][:,:,:] f.close() fig =