hdf5 | 易学教程

Listing datasets in a group in HDF5

阅读更多关于 Listing datasets in a group in HDF5

问题 I decided to store my data in HDF5 using its hierarchical structure instead of relying on the filesystem. Unfortunately, I'm having performance issues. My data is formatted as follows: I have about 70 top level groups, corresponding to dates and each of them contain roughly 8000 datasets. I would like to see a list of the number of datasets per day: for date in hdf5.keys(): print(len(hdf5[date])) I'm finding it a little frustrating that this takes 2+ second/iteration. Also, I have two

Missing dependency for hdf5: totem

阅读更多关于 Missing dependency for hdf5: totem

问题 while installing the following command I get the error as shown below parag@parag:~/torch-hdf5$ sudo luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/" Missing dependencies for hdf5: totem Error: Could not satisfy dependency: totem Totem is already installed. parag@parag:~$ sudo apt-get install totem [sudo] password for parag: Reading package lists... Done Building dependency tree Reading state information... Done totem is already the newest version. 0 upgraded, 0

Delete or update a dataset in HDF5?

阅读更多关于 Delete or update a dataset in HDF5?

问题 I would like to programatically change the data associated with a dataset in an HDF5 file. I can't seem to find a way to either delete a dataset by name (allowing me to add it again with the modified data) or update a dataset by name. I'm using the C API for HDF5 1.6.x but pointers towards any HDF5 API would be useful. 回答1: According to the user guide: HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to reclaim the storage space occupied by a deleted

H5PY Writes Very Slow

阅读更多关于 H5PY Writes Very Slow

问题 I have a h5py dataset like below. I want to index the records by string instead of by numeric value. So, e.g. I would be able to get the value of the first record by dset[dset.attrs['id1']] . I am trying to write the attributes with the code below, but it is extremely slow. If I do a %timeit dset.attrs[rid] = idx in the loop a single write is about 310ms. The strings I am writing are 36 characters. I have about 100k records I need to write, which would take about 9 hours. Something must be

How to save list of numpy.arrays of different shape with h5py?

阅读更多关于 How to save list of numpy.arrays of different shape with h5py?

问题 I'm saving a large dataset of images (flickr25k dataset) into hdf5 using h5py. However image are different in size, thus I can't create a dataset with shape (nb_images, height, width) . Now I'm using multiple datasets to handle this problem. Thus create_dataset('image1', shape=shape1) , create_dataset('image2', shape=shape2) , etc. In python we can use a list to save multiple numpy.array with different size easily. I'm wondering if we can do the same thing with h5py, and fetch data with

How to put my dataset in a .pkl file in the exact format and data structure used in “mnist.pkl”?

阅读更多关于 How to put my dataset in a .pkl file in the exact format and data structure used in “mnist.pkl”?

问题 I'm trying to make a dataset of images in the same format as mnist.pkl I have used https://github.com/dmitriy-serdyuk/cats_vs_dogs/blob/master/cats_vs_dogs/make_dataset.py as reference. This is what i have so far path = '/home/dell/thesis/neon/Images' def PIL2array(img): return numpy.array(img.getdata(), numpy.uint8).reshape(img.size[1], img.size[0], 1) def main(): fileList = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(path) for f in files if f.endswith('.jpg')] print

How to know HDF5 dataset name in python

阅读更多关于 How to know HDF5 dataset name in python

问题 I want to read the HDF5 file into Python and do some coding work. To access the data in HDF5 file in python environment, you need dataset name of HDF5 file. However, I do not know how to find the dataset name and I would like to ask for help. def select_HDF_file(self): filename2 = QFileDialog.getOpenFileName(self.dlg, "Select output file","",'*.hdf') dataset_name = '**************' file = h5py.File(filename2 , 'r') dataset = file[dataset_name] 回答1: file is a python dictionary. Thus you can

Deleting information from an HDF5 file

阅读更多关于 Deleting information from an HDF5 file

问题 I realize that a SO user has formerly asked this question but it was asked in 2009 and I was hoping that more knowledge of HDF5 was available or newer versions had fixed this particular issue. To restate the question here concerning my own problem; I have a gigantic file of nodes and elements from a large geometry and have already retrieved all the useful information I need from it. Therefore, in Python, I am trying to keep the original file, but delete the information I do not need and fill

Why do pandas and dask perform better when importing from CSV compared to HDF5?

阅读更多关于 Why do pandas and dask perform better when importing from CSV compared to HDF5?

问题 I am working with a system that currently operates with large (>5GB) .csv files. To increase performance, I am testing (A) different methods to create dataframes from disk (pandas VS dask) as well as (B) different ways to store results to disk (.csv VS hdf5 files). In order to benchmark performance, I did the following: def dask_read_from_hdf(): results_dd_hdf = dd.read_hdf('store.h5', key='period1', columns = ['Security']) analyzed_stocks_dd_hdf = results_dd_hdf.Security.unique() hdf.close()

How to store an array in hdf5 file which is too big to load in memory?

阅读更多关于 How to store an array in hdf5 file which is too big to load in memory?

问题 Is there any way to store an array in an hdf5 file, which is too big to load in memory? if I do something like this f = h5py.File('test.hdf5','w') f['mydata'] = np.zeros(2**32) I get a memory error. 回答1: According to the documentation, you can use create_dataset to create a chunked array stored in the hdf5. Example: >>> import h5py >>> f = h5py.File('test.h5', 'w') >>> arr = f.create_dataset('mydata', (2**32,), chunks=True) >>> arr <HDF5 dataset "mydata": shape (4294967296,), type "<f4">