hdf5

Pandas: in memory sorting hdf5 files

China☆狼群 提交于 2019-12-06 05:47:48
I have the following problem: I have a set several hdf5 files with similar data frames which I want to sort globally based on multiple columns. My input is the file names and an ordered list of columns I want to use for sorting. The output should be a single hdf5 file containing all the sorted data. Each file can contain millions of rows. I can afford loading a single file in memory but not the entire dataset. Naively I would like first to copy all the data in a single hdf5 file (which is not difficult) and then find out a way to do in memory sorting of this huge file. Is there a quick way to

Exception 'HDFStore requires PyTables ' when using HDF5 file in iPython

自作多情 提交于 2019-12-06 03:57:14
I am very new to Python and am trying to create a table in pandas with HDFStore as follows store = HDFStore('store.h5') I get the exception : Exception Traceback (most recent call last) C:\Python27\<ipython-input-11-de3060b689e6> in <module>() ----> 1 store = HDFStore('store.h5') C:\Python27\lib\site-packages\pandas-0.10.1-py2.7-win32.egg\pandas\io\pytables.pyc in __init__(self, path, mode, complevel, complib, fletcher32) 196 import tables as _ 197 except ImportError: # pragma: no cover --> 198 raise Exception('HDFStore requires PyTables') 199 200 self.path = path Exception: HDFStore requires

How to know HDF5 dataset name in python

懵懂的女人 提交于 2019-12-06 03:46:38
I want to read the HDF5 file into Python and do some coding work. To access the data in HDF5 file in python environment, you need dataset name of HDF5 file. However, I do not know how to find the dataset name and I would like to ask for help. def select_HDF_file(self): filename2 = QFileDialog.getOpenFileName(self.dlg, "Select output file","",'*.hdf') dataset_name = '**************' file = h5py.File(filename2 , 'r') dataset = file[dataset_name] file is a python dictionary. Thus you can iterate over file and stock all datasets for example: >>> file = h5py.File('file.h5', 'r') >>> dataset = [] >>

Reading hdf5 files to dynamic arrays in c++

谁都会走 提交于 2019-12-06 02:11:13
I am trying to read large 3D hdf5 files into a dynamic array due to size limitations on the stack. I have tried several different methods and have failed via segmentation fault. Below is example code to show my issue. I would very much appreciate some help!! //This example was based on several examples which came in the c++ examples directory of the hdf5 package. #ifdef OLD_HEADER_FILENAME #include <iostream.h> #else #include <iostream> #endif #include <string> #include <new> #include "hdf5.h" #include "H5Cpp.h" #ifndef H5_NO_NAMESPACE using namespace H5; #endif const H5std_string FILE_NAME(

HDF5 Storage Overhead

我怕爱的太早我们不能终老 提交于 2019-12-06 00:41:58
问题 I'm writing a large number of small datasets to an HDF5 file, and the resulting filesize is about 10x what I would expect from a naive tabulation of the data I'm putting in. My data is organized hierarchically as follows: group 0 -> subgroup 0 -> dataset (dimensions: 100 x 4, datatype: float) -> dataset (dimensions: 100, datatype: float) -> subgroup 1 -> dataset (dimensions: 100 x 4, datatype: float) -> dataset (dimensions: 100, datatype: float) ... group 1 ... Each subgroup should take up

merging several hdf5 files into one pytable

左心房为你撑大大i 提交于 2019-12-05 22:16:51
I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files. What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2. How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following: import tables as tb file1 = tb.open

Using std:: string in hdf5 creates unreadable output

╄→гoц情女王★ 提交于 2019-12-05 21:35:26
I'm currently using hdf5 1.8.15 on Windows 7 64bit. The sourcecode of my software is saved in files using utf8 encoding. As soon as I call any hdf5 function supporting std:: string, the ouput gets cryptic But if I use const char* instead of std::string , everything works fine. This applies also to the filename. Here is a short sample: std::string filename_ = "test.h5"; H5::H5File file( filename_.c_str(), H5F_ACC_TRUNC); // works H5::H5File file( filename_, H5F_ACC_TRUNC); // filename is not readable or // hdf5 throws an exception I guess that this problem is caused by different encodings used

Storing Pandas objects along with regular Python objects in HDF5

主宰稳场 提交于 2019-12-05 16:15:22
问题 Pandas has a nice interface that facilitates storing things like Dataframes and Series in an HDF5: random_matrix = np.random.random_integers(0,10, m_size) my_dataframe = pd.DataFrame(random_matrix) store = pd.HDFStore('some_file.h5',complevel=9, complib='bzip2') store['my_dataframe'] = my_dataframe store.close() But if I try to save some other regular Python objects in the same file, it complains: my_dictionary = dict() my_dictionary['a'] = 2 # <--- ERROR my_dictionary['b'] = [2,3,4] store[

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

懵懂的女人 提交于 2019-12-05 16:13:19
I have the following pandas dataframe: import pandas as pd df = pd.read_csv(filename.csv) Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary): store = HDFStore('store.h5') store['df'] = df http://pandas.pydata.org/pandas-docs/stable/io.html When I look at the contents, this object is a frame . store outputs <class 'pandas.io.pytables.HDFStore'> File path: store.h5 /df frame (shape->[552,23252]) However, in order to use indexing, one should store this as a table object. My approach was to try HDFStore.put() i.e. HDFStore.put(key="store.h",

R hdf5 dataset written incorrectly?

一曲冷凌霜 提交于 2019-12-05 14:15:05
When I execute the following my "predictors" dataset is populated correctly: library(rhdf5) library(forecast) library(sltl) library(tseries) fid <- H5Fcreate(output_file) ## TODO: compute the order p p <- 4 # write predictors h5createDataset(output_file, dataset="predictors", c(p, length(tsstl.remainder) - (p - 1)), storage.mode='double') predictors <- as.matrix(tsstl.remainder) for (i in 1:(p - 1)) { predictors <- as.matrix(cbind(predictors, Lag(as.matrix(tsstl.remainder), i))) } predictors <- as.matrix(predictors[-1:-(p-1),]) head(predictors) h5write(predictors, output_file, name="predictors