h5py | 易学教程

Why do pickle + gzip outperform h5py on repetitive datasets?

阅读更多关于 Why do pickle + gzip outperform h5py on repetitive datasets?

问题 I am saving a numpy array which contains repetitive data: import numpy as np import gzip import cPickle as pkl import h5py a = np.random.randn(100000, 10) b = np.hstack( [a[cnt:a.shape[0]-10+cnt+1] for cnt in range(10)] ) f_pkl_gz = gzip.open('noise.pkl.gz', 'w') pkl.dump(b, f_pkl_gz, protocol = pkl.HIGHEST_PROTOCOL) f_pkl_gz.close() f_pkl = open('noise.pkl', 'w') pkl.dump(b, f_pkl, protocol = pkl.HIGHEST_PROTOCOL) f_pkl.close() f_hdf5 = h5py.File('noise.hdf5', 'w') f_hdf5.create_dataset('b',

How to write a Pandas Dataframe into a HDF5 dataset

阅读更多关于 How to write a Pandas Dataframe into a HDF5 dataset

问题 I'm trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I'd like to keep it as a single file which will grow in the future on a daily basis. I've had a go with the following code, which shows the structure of what I'd like to achieve import h5py import numpy as np import pandas as pd file = h5py.File('database.h5','w') d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=[

How do I traverse a hdf5 file using h5py

阅读更多关于 How do I traverse a hdf5 file using h5py

问题 How do I traverse all the groups and datasets of an hdf5 file using h5py? I want to retrieve all the contents of the file from a common root using a for loop or something similar. 回答1: visit() and visititems() are your friends here. Cf. http://docs.h5py.org/en/latest/high/group.html#Group.visit. Note that an h5py.File is also an h5py.Group . Example (not tested): def visitor_func(name, node): if isinstance(node, h5py.Dataset): # node is a dataset else: # node is a group with h5py.File('myfile

What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

阅读更多关于 What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

问题 I have read several times that turning on compression in HDF5 can lead to better read/write performance. I wonder what ideal settings can be to achieve good read/write performance at: data_df.to_hdf(..., format='fixed', complib=..., complevel=..., chunksize=...) I'm already using fixed format (i.e. h5py ) as it's faster than table . I have strong processors and do not care much about disk space. I often store DataFrame s of float64 and str types in files of approx. 2500 rows x 9000 columns.

How to feed .h5 files in tf.data pipeline in tensorflow model

阅读更多关于 How to feed .h5 files in tf.data pipeline in tensorflow model

问题 I'm trying to optimize the input pipeline for .h5 data with tf.data. But I encountered a TypeError: expected str, bytes or os.PathLike object, not Tensor . I did a research but can't find anything about converting a tensor of string to string. This simplified code is executable and return the same error: batch_size = 1000 conv_size = 3 nb_conv = 32 learning_rate = 0.0001 # define parser function def parse_function(fname): with h5py.File(fname, 'r') as f: #Error comes from here X = f['X']

Fastest way to write HDF5 files with Python?

阅读更多关于 Fastest way to write HDF5 files with Python?

问题 Given a large (10s of GB) CSV file of mixed text/numbers, what is the fastest way to create an HDF5 file with the same content, while keeping the memory usage reasonable? I'd like to use the h5py module if possible. In the toy example below, I've found an incredibly slow and incredibly fast way to write data to HDF5. Would it be best practice to write to HDF5 in chunks of 10,000 rows or so? Or is there a better way to write a massive amount of data to such a file? import h5py n = 10000000 f =

Visible Deprecation warning…?

阅读更多关于 Visible Deprecation warning…?

问题 I have some data that Im reading from a h5 file as a numpy array and am doing some analysis with. For context, the data plots a spectral response curve. I am indexing the data (and a subsequent array I have made for my x axis) to get a specific value or range of values. Im not doing anything complex and even the little maths I'm doing is pretty basic. However I get the following warning error in a number of places "VisibleDeprecationWarning: boolean index did not match indexed array along

How to edit h5 files with h5py?

阅读更多关于 How to edit h5 files with h5py?

问题 The question on overwrite array using h5py did not solve my problem. I want to edit the array values of a VGG16 model. f = h5py.File('C:/Users/yash/.keras/models/vgg16_weights_tf_dim_ordering_tf_kernels_2.h5', mode = 'a') ab = list(h5py.AttributeManager.keys(f)) print(list(f.attrs.keys())) print(ab) The code above returns: ['layer_names'] ['block1_conv1', 'block1_conv2', 'block1_pool', 'block2_conv1', 'block2_conv2', 'block2_pool', 'block3_conv1', 'block3_conv2', 'block3_conv3', 'block3_pool'

h5py: chunking on resizable dataset

阅读更多关于 h5py: chunking on resizable dataset

问题 I have a series of raster datasets which I want to combine into a single HDF5 file. Each raster file will be converted into an array with the dimensions 3600 x 7000 . As I have a total of 659 files, the final array would have a shape of 3600 x 7000 x 659 , too big for my (huge) amount of RAM. I'm fairly new to python and HDF5 itself, but basically my approach is to create a dataset with the required 2-d dimensions and then iteratively read the files into arrays and append to the dataset. I'm

Colormap issue using animation in matplotlib

阅读更多关于 Colormap issue using animation in matplotlib

问题 I use matplotlib.animation to animate data in a 3D array named arr . I read data from a h5 file using h5py library and everything is OK. But when using animation, the colormap got stuck in first frame of the data range, and after some steps it shows unnormalized colors while plotting. Here is my code: import numpy as np import h5py import matplotlib.pyplot as plt import matplotlib.animation as animation import matplotlib.cm as cm f = h5py.File('ez.h5','r') arr = f["ez"][:,:,:] f.close() fig =