hdf5 | 易学教程

Python HDF5 H5Py issues opening multiple files

阅读更多关于 Python HDF5 H5Py issues opening multiple files

问题 I am usint the 64-bit version of Enthought Python to process data across multiple HDF5 files. I'm using h5py version 1.3.1 (HDF5 1.8.4) on 64-bit Windows. I have an object that provides a convenient interface to my specific data heirarchy, but testing the h5py.File(fname, 'r') independently yields the same results. I am iterating through a long list (~100 files at a time) and attempting to pull out specific pieces of information from the files. The problem I'm having is that I'm getting the

Saving in hdf5save creates an unreadable file

阅读更多关于 Saving in hdf5save creates an unreadable file

问题 I'm trying to save an array as a HDF5 file using R, but having no luck. To try and diagnose the problem I ran example(hdf5save) . This successfully created a HDF5 file that I could read easily with h5dump . When I then ran the R code manually, I found that it didn't work. The code I ran was exactly the same as is ran in the example script (except for a change of filename to avoid overwriting). Here is the code: (m <- cbind(A = 1, diag(4))) ll <- list(a=1:10, b=letters[1:8]); l2 <- list(C="c",

Pass hdf5 file to h5py as binary blob / string?

阅读更多关于 Pass hdf5 file to h5py as binary blob / string?

问题 How can I bypass disk I/O in h5py? Currently I have to do something like this: msg = socket.recv() fp = open("tmp.hdf5", 'wb') fp.write(msg) fp.close() f = h5py.File('tmp.hdf5', 'r') ... # alter the file fp = open("tmp.hdf5", 'rb') msg = fp.read() msg = f.toString() socket.send(data) I want to do something like this: msg = socket.recv() f = h5py.File(msg, driver='core') ... # alter the file msg = f.toString() socket.send(msg) My issue here is speed - disk I/O is too huge of a bottleneck. Is

Updating h5py Datasets

阅读更多关于 Updating h5py Datasets

问题 Does any one have an idea for updating hdf5 datasets from h5py? Assuming we create a dataset like: import h5py import numpy f = h5py.File('myfile.hdf5') dset = f.create_dataset('mydataset', data=numpy.ones((2,2),"=i4")) new_dset_value=numpy.zeros((3,3),"=i4") Is it possible to extend the dset to a 3x3 numpy array? 回答1: You need to create the dataset with the "extendable" property. It's not possible to change this after the initial creation of the dataset. To do this, you need to use the

which is faster for load: pickle or hdf5 in python [closed]

阅读更多关于 which is faster for load: pickle or hdf5 in python [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 3 years ago . Given a 1.5 Gb list of pandas dataframes, which format is fastest for loading compressed data : pickle (via cPickle), hdf5, or something else in Python? I only care about fastest speed to load the data into memory I don't care about dumping the data, it's slow but I only do this

Storing multidimensional variable length array with h5py

阅读更多关于 Storing multidimensional variable length array with h5py

问题 I'm trying to store a list of variable length arrays in an HDF file with the following procedure: phn_mfccs = [] # Import wav files for waveform in files: phn_mfcc = mfcc(waveform) # produces a variable length multidim array of the shape (x, 13, 1) # Add MFCC and label to dataset # phn_mfccs has dimension (len(files),) # phn_mfccs[i] has variable dimension ([# of frames in ith segment] (variable), 13, 1) phn_mfccs.append(phn_mfcc) dt = h5py.special_dtype(vlen=np.dtype('float64')) mfccs_out

Writing a large hdf5 dataset using h5py

阅读更多关于 Writing a large hdf5 dataset using h5py

问题 At the moment, I am using h5py to generate hdf5 datasets. I have something like this import h5py import numpy as np my_data=np.genfromtxt("/tmp/data.csv",delimiter=",",dtype=None,names=True) myFile="/tmp/f.hdf" with h5py.File(myFile,"a") as f: dset = f.create_dataset('%s/%s'%(vendor,dataSet),data=my_data,compression="gzip",compression_opts=9) This works well for a relatively large ASCII file (400MB). I would like to do the same for a even larger dataset (40GB). Is there a better or more

Combining hdf5 files

阅读更多关于 Combining hdf5 files

问题 I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file containing all datasets separately (i.e. not to concatenate the datasets into a single dataset). One way to do this is to create a hdf5 file and then copy the datasets one by one. This will be slow and complicated because it will need to be buffered copy. Is there a more simple way to do this? Seems like there should be, since it

How to trouble-shoot HDFStore Exception: cannot find the correct atom type

阅读更多关于 How to trouble-shoot HDFStore Exception: cannot find the correct atom type

问题 I am looking for some general guidance on what kinds of data scenarios can cause this exception. I have tried massaging my data in various ways to no avail. I have googled this exception for days now, gone through several google group discussions and come up with no solution to the debugging HDFStore Exception: cannot find the correct atom type . I am reading in a simple csv file of mixed data types: Int64Index: 401125 entries, 0 to 401124 Data columns: SalesID 401125 non-null values

How to get faster code than numpy.dot for matrix multiplication?

阅读更多关于 How to get faster code than numpy.dot for matrix multiplication?

问题 Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And maybe there is some faster function for matrix multiplication in python, because I still use numpy.dot for small block matrix multiplication. here is some code: Assume matrices can fit in RAM: test on matrix 10*1000 x 1000. Using default numpy(I think