Searching a HDF5 dataset

前端 未结 4 2122
情歌与酒
情歌与酒 2020-12-09 10:34

I\'m currently exploring HDF5. I\'ve read the interesting comments from the thread \"Evaluating HDF5\" and I understand that HDF5 is a solution of choice for storing the dat

4条回答
  •  感情败类
    2020-12-09 11:26

    I think the answer is "not directly".

    Here are some of the ways I think you could achieve the functionality.

    Use groups:

    A hierarchy of groups could be used in the form of a Radix Tree to store the data. This probably doesn't scale too well though.

    Use index datasets:

    HDF has a reference type which could be used to link to a main table from a separate index tables. After writing the main data, other datasets sorted on other keys with references can be used. For example:

    MainDataset (sorted on identifier)
    0: { A, "C", 2 }
    1: { B, "B", 1 }
    2: { C, "A", 3 }
    
    StringIndex
    0: { "A", Reference ("MainDataset", 2) }
    1: { "B", Reference ("MainDataset", 1) }
    2: { "C", Reference ("MainDataset", 0) }
    
    IntIndex
    0: { 1, Reference ("MainDataset", 1) }
    1: { 2, Reference ("MainDataset", 0) }
    2: { 3, Reference ("MainDataset", 2) }
    

    In order to use the above a binary search will have to be written when looking up the field in the Index tables.

    In memory Index:

    Depending on the size of the dataset it may be just as easy to use an in memory index that is read/written to its own dataset using something like "boost::serialize".

    HDF5-FastQuery:

    This paper (and also this page) describe the use of bitmap indices to perform complex queries over a HDF dataset. I have not tried this.

提交回复
热议问题