HDF5 rowmajor or colmajor

前端 未结 2 2459
你的背包
你的背包 2021-01-12 20:50

Is it possible to know if a matrix stored in HDF5 format is in RowMajor or ColMajor? For example when I save matrices from octave, which stores them internally as ColMajor,

2条回答
  •  醉话见心
    2021-01-12 21:06

    As @Yossarian pointed out. HDF5 always stores data as row-major (C convention). Octave is the same as Fortran and internally stores data as column-major.

    When writing a matrix from Octave, the HDF5 layer does the transpose for you, so it is always written as row-major no matter what language you use. This provides the portability of the file.

    There is a very good example in the HDF5 User's Guide section 7.3.2.5, as mentioned by @Yossarian. Here's the example (almost) reproduced using Octave:

    octave:1> A = [ 1:3; 4:6 ]
    A =
    
       1   2   3
       4   5   6
    
    octave:2> save("-hdf5", "test.h5", "A")
    octave:3> quit
    
    ~$ h5dump test.h5
    HDF5 "test.h5" {
    GROUP "/" {
       COMMENT "# Created by Octave 3.6.4, Fri Jun 13 08:36:16 2014 MDT "
       GROUP "A" {
          ATTRIBUTE "OCTAVE_NEW_FORMAT" {
             DATATYPE  H5T_STD_U8LE
             DATASPACE  SCALAR
             DATA {
             (0): 1
             }
          }
          DATASET "type" {
             DATATYPE  H5T_STRING {
                STRSIZE 7;
                STRPAD H5T_STR_NULLTERM;
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
             DATASPACE  SCALAR
             DATA {
             (0): "matrix"
             }
          }
          DATASET "value" {
             DATATYPE  H5T_IEEE_F64LE
             DATASPACE  SIMPLE { ( 3, 2 ) / ( 3, 2 ) }
             DATA {
             (0,0): 1, 4,
             (1,0): 2, 5,
             (2,0): 3, 6
             }
          }
       }
    }
    }
    

    Notice how the HDF5 layer has transposed the matrix to make sure it is stored in row-major format.

    Then an example of reading it in C:

    #include 
    #include 
    #include 
    #include 
    
    #define FILE "test.h5"
    #define DS   "A/value"
    
    int
    main(int argc, char **argv)
    {
            int i = 0;
            int j = 0;
            int n = 0;
            int x = 0;
            int rank = 0;
            hid_t file_id;
            hid_t space_id;
            hid_t dset_id;
            herr_t stat;
            hsize_t *dims = NULL;
            int *data = NULL;
    
            file_id  = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
            dset_id  = H5Dopen(file_id, DS, dset_id);
    
            space_id = H5Dget_space(dset_id);
            n    = H5Sget_simple_extent_npoints(space_id);
            rank = H5Sget_simple_extent_ndims(space_id);
    
            dims = malloc(rank*sizeof(int));
            stat = H5Sget_simple_extent_dims(space_id, dims, NULL);
    
            printf("rank: %d\t dimensions: ", rank);
            for (i = 0; i < rank; ++i) {
                    if (i == 0) {
                            printf("(");
                    }
                    printf("%llu", dims[i]);
                    if (i == (rank -1)) {
                            printf(")\n");
                    } else {
                            printf(" x ");
                    }
            }
            data = malloc(n*sizeof(int));
            memset(data, 0, n*sizeof(int));
            stat  = H5Dread(dset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                             data);
    
    
            printf("%s:\n", DS);
            for (i = 0; i < dims[0]; ++i) {
                    printf(" [ ");
                    for (j = 0; j < dims[1]; ++j) {
                            x = i * dims[1] + j;
                            printf("%d ", data[x]);
                    }
                    printf("]\n");
            }
    
            stat  = H5Sclose(space_id);
            stat  = H5Dclose(dset_id);
            stat  = H5Fclose(file_id);
    
    
            return(EXIT_SUCCESS);
    }
    

    When compiled and run gives:

    ~$ h5cc -o rmat rmat.c
    ~$ ./rmat
    rank: 2  dimensions: (3 x 2)
    A/value:
     [ 1 4 ]
     [ 2 5 ]
     [ 3 6 ]
    

    This is great as it means the matrices are stored optimized in memory. What it does mean though is that you have to change how you do your calculations. For row-major you need to do pre-multiplication, while for column-major you should be doing post-multiplication. Here is an example, hopefully it is explained a bit clearer.

    Does this help?

提交回复
热议问题