How do I read/write to a subgroup withing a HDF5Store?

匿名 (未验证) 提交于 2019-12-03 01:33:01

问题:

I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results,

  • Raw results, that have not been processed at all, just read-in and merged from their original CSV formats
  • Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings
  • Summarised results that have useful summery columns added and redundant columns removed, for easy reading.

I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised.

I wanted a structure like:

<class 'pandas.io.pytables.HDFStore'> File path: results.h5 /proccessed/dbn_reinit                           frame        (shape->[22880,19]) /proccessed/dbn_rerep_code                       frame        (shape->[11440,18]) /proccessed/dbn_rerep_enhanced_input             frame        (shape->[11440,18]) /proccessed/linear_classifier                    frame        (shape->[572,18])   /proccessed/msda_rerep_code                      frame        (shape->[18304,17]) /proccessed/msda_rerep_enhanced_input            frame        (shape->[18304,17]) /raw/dbn_reinit                                  frame        (shape->[22880,15]) /raw/dbn_rerep                                   frame        (shape->[23452,15]) /raw/msda_rerep                                  frame        (shape->[36608,14]) /summerised/dbn_reinit                           frame        (shape->[22880,10]) /summerised/dbn_rerep_code                       frame        (shape->[11440,9])  /summerised/dbn_rerep_enhanced_input             frame        (shape->[11440,9])  /summerised/linear_classifier                    frame        (shape->[572,6])    /summerised/msda_rerep_code                      frame        (shape->[18304,10]) /summerised/msda_rerep_enhanced_input            frame        (shape->[18304,10]) 

I expected I could create this by saying:

store = pandas.HDF5Store('results.h5') store.add_group('raw') raw_store = store['raw']  raw_store['dbn_reinit'] = dbn_reinit_dataframe raw_store['dbn_rerep_code'] = dbn_rerep_code_dataframe ... 

etc

However there doesn't seem to be a method of getting a subgroup of a store and using it as it it was a store,

so i had to do:

store = pd.HDFStore('results.h5', mode='w')  store['raw/dbn_reinit'] = dbn_reinit_dataframe store['raw/dbn_rerep'] = dbn_reinit_dataframe ... 

which is wordy, and doesn't really show any kind of grouping of the results into the 3 catagories Am i missing something? Or is the Hieratrchical features of the HDF, just writing really long key names that have /s in them?

回答1:

docs on using the hierarchical keys are here. .remove() has this type of functionaility, where you can remove nodes at that level and further down the tree.

You can do: store.get_storer('foo') to return an object that includes access to the node. (e.g. .group). However, this object won't allow you to add/select sub-nodes, nor does it provide a nice repr of that node.

You could put in a feature request for these features on github. Please include a reproducible example of what you think this should do.

Pull-requests are welcome!

I rarely use multiple groups. Mainly because of the flexibility of using different files. You can do what you are trying to do, I just have never found a need for it (e.g. treat your group as the file itself). HDF5 is not a database so this is rarely useful



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!