Pandas has the following examples for how to store Series
, DataFrames
and Panels
in HDF5 files:
Answering question 2, with pandas 0.18.0 you can do:
store = pd.HDFStore('compiled_measurements.h5')
for filepath in file_iterator:
raw = pd.read_csv(filepath)
store.append('measurements', raw, index=False)
store.create_table_index('measurements', columns=['a', 'b', 'c'], optlevel=9, kind='full')
store.close()
Based on this part of the docs.
Depending on how much data you have, the index creation can consume enormous amounts of memory. The PyTables docs describes the values of optlevel.