hdf5 | 易学教程

How to batch select and calculate arrays in Numpy?

阅读更多关于 How to batch select and calculate arrays in Numpy?

问题 How to (1) batch select all arrays under a hdf5 file, then (2) apply calculations on those arrays and finally (3) batch create new arrays in another hdf5 file? for example: import numpy import tables file = openFile('file1',"r") array1 = file.root.array1 array1_cal = (array1 <= 1) newfile.createArray('/','array1_cal',array1_cal) array2 = file.root.array2 array2_cal = (array2 <= 1) newfile.createArray('/','array2_cal',array2_cal) I have 100+ arrays under a single hdf5 file and several hdf5

Pandas to_hdf succeeds but then read_hdf fails

阅读更多关于 Pandas to_hdf succeeds but then read_hdf fails

问题 Pandas to_hdf succeeds but then read_hdf fails when I use custom objects as column headers (I use custom objects because I need to store other info in them). Is there some way to make this work? Or is this just a Pandas bug or PyTables bug? As an example, below, I will show first making a DataFrame foo that uses string column headers, and everything works fine with to_hdf / read_hdf , but then changing foo to use a custom Col class for column headers, to_hdf still works fine but then read_hdf

PyTables writing error

阅读更多关于 PyTables writing error

问题 I am creating and filling a PyTables Carray the following way: #a,b = scipy.sparse.csr_matrix f = tb.open_file('../data/pickle/dot2.h5', 'w') filters = tb.Filters(complevel=1, complib='blosc') out = f.create_carray(f.root, 'out', tb.Atom.from_dtype(a.dtype), shape=(l, n), filters=filters) bl = 2048 l = a.shape[0] for i in range(0, l, bl): out[:,i:min(i+bl, l)] = (a.dot(b[:,i:min(i+bl, l)])).toarray() The script was running fine for nearly two days (I estimated that it would need at least 4

Fast and efficient way of serializing and retrieving a large number of numpy arrays from HDF5 file

阅读更多关于 Fast and efficient way of serializing and retrieving a large number of numpy arrays from HDF5 file

问题 I have a huge list of numpy arrays, specifically 113287 , where each array is of shape 36 x 2048 . In terms of memory, this amounts to 32 Gigabytes . As of now, I have serialized these arrays as a giant HDF5 file. Now, the problem is that retrieving individual arrays from this hdf5 file takes excruciatingly long time (north of 10 mins) for each access. How can I speed this up? This is very important for my implementation since I have to index into this list several thousand times for feeding

Read HDF5 files from *.tar.gz compressed file in scala in Spark

阅读更多关于 Read HDF5 files from *.tar.gz compressed file in scala in Spark

问题 After referencing to this post, I could read multiple *.txt files residing in a *.tar.gz file. But for now, I need to read HDF5 files in a *.tar.gz file. The sample file could be downloaded here, which is generated from million songs dataset. Could anyone tell me how I should change the following code in order to read HDF5 files into RDD? Thanks! package a.b.c import org.apache.spark._ import org.apache.spark.sql.{SQLContext, DataFrame} import org.apache.spark.ml.tuning.CrossValidatorModel

how to convert .pts or .npy file into .ply or .h5 file?

阅读更多关于 how to convert .pts or .npy file into .ply or .h5 file?

问题 I have 3d point cloud data as .npy file and .pts data. To use these data for 3d classification neural net, I have to change these data to .h5 file. So, first I am trying to convert .npy or .pts file to .ply file using python. Could you refer to me example codes or help me for converting file format? Also, I will really appreciate for ways to convert .ply to .h5 format.. Sorry for my poor english skills. 回答1: I hope this code will get you started, it shows how to create a h5 file from a npy

How to close an HDF5 using low level Python API?

阅读更多关于 How to close an HDF5 using low level Python API?

问题 I was able to modify the cache settings of an HDF5 file by combining both the high and low level Python h5py API as defined in the following Stack Overflow question: How to set cache settings while using h5py high level interface? I am getting an error saying that the h5 file is still open when I try to rename the file. The Python "with" statement with the contextlib does not seem to be closing the file after the HDF5 writing operation is completed and the file is flushed. How can I make sure

Difference between str() and astype(str)?

阅读更多关于 Difference between str() and astype(str)?

问题 I want to save the dataframe df to the .h5 file MainDataFile.h5 : df.to_hdf ("c:/Temp/MainDataFile.h5", "MainData", mode = "w", format = "table", data_columns=['_FirstDayOfPeriod','Category','ChannelId']) and get the following error : *** Exception: cannot find the correct atom type -> > [dtype->object,items->Index(['Libellé_Article', 'Libellé_segment'], dtype='object')] If I modifify the column 'Libellé_Article' in this way : df['Libellé_Article'] = str(df['Libellé_Article']) there is no

HDF5Data Processing with Caffe's Transformer for training

阅读更多关于 HDF5Data Processing with Caffe's Transformer for training

问题 I am trying to load data to the network, since I need a custom data input (3 tops: 1 for data image, 2 for different labels) I load the data with HD5F files. It looks similar to this: layer { name: "data" type: "HDF5Data" top: "img" top: "alabels" top: "blabels" include { phase: TRAIN } hdf5_data_param { source: "path_to_caffe/examples/hdf5_classification/data/train.txt" batch_size: 64 } } I want to preprocess the images using Caffe's own Transformer (for standard), how can I do this when I

Comparing h5 files

阅读更多关于 Comparing h5 files

问题 I often have to compare hdf files. How I do it is either with a binary diff (which tells me files are different even though the actual numbers inside are the same) or by dumping the content into a txt file with h5dump and the comparing the content of the two files (which is also quite annoying). I was wondering if there is a more clever way to do this, perhaps a feature of h5 or of softwares like HDFView or Panoply . 回答1: Perhaps hdiff is what you require ? Some examples here 来源： https:/