HDF5 rowmajor or colmajor

前端 未结 2 2371
你的背包
你的背包 2021-01-12 20:50

Is it possible to know if a matrix stored in HDF5 format is in RowMajor or ColMajor? For example when I save matrices from octave, which stores them internally as ColMajor,

相关标签:
2条回答
  • 2021-01-12 20:57

    HDF5 stores data in row major order:

    HDF5 uses C storage conventions, assuming that the last listed dimension is the fastest-changing dimension and the first-listed dimension is the slowest changing.

    from the HDF5 User's Guide.

    However, if you're using Octave's built-in HDF5 interface, it will automatically transpose the arrays for you. In general, how the data is actually written in the HDF5 file should be completely opaque to the end-user, and the interface should deal with differences in array ordering, etc.

    0 讨论(0)
  • 2021-01-12 21:06

    As @Yossarian pointed out. HDF5 always stores data as row-major (C convention). Octave is the same as Fortran and internally stores data as column-major.

    When writing a matrix from Octave, the HDF5 layer does the transpose for you, so it is always written as row-major no matter what language you use. This provides the portability of the file.

    There is a very good example in the HDF5 User's Guide section 7.3.2.5, as mentioned by @Yossarian. Here's the example (almost) reproduced using Octave:

    octave:1> A = [ 1:3; 4:6 ]
    A =
    
       1   2   3
       4   5   6
    
    octave:2> save("-hdf5", "test.h5", "A")
    octave:3> quit
    
    ~$ h5dump test.h5
    HDF5 "test.h5" {
    GROUP "/" {
       COMMENT "# Created by Octave 3.6.4, Fri Jun 13 08:36:16 2014 MDT <user@localhost>"
       GROUP "A" {
          ATTRIBUTE "OCTAVE_NEW_FORMAT" {
             DATATYPE  H5T_STD_U8LE
             DATASPACE  SCALAR
             DATA {
             (0): 1
             }
          }
          DATASET "type" {
             DATATYPE  H5T_STRING {
                STRSIZE 7;
                STRPAD H5T_STR_NULLTERM;
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
             DATASPACE  SCALAR
             DATA {
             (0): "matrix"
             }
          }
          DATASET "value" {
             DATATYPE  H5T_IEEE_F64LE
             DATASPACE  SIMPLE { ( 3, 2 ) / ( 3, 2 ) }
             DATA {
             (0,0): 1, 4,
             (1,0): 2, 5,
             (2,0): 3, 6
             }
          }
       }
    }
    }
    

    Notice how the HDF5 layer has transposed the matrix to make sure it is stored in row-major format.

    Then an example of reading it in C:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <hdf5.h>
    
    #define FILE "test.h5"
    #define DS   "A/value"
    
    int
    main(int argc, char **argv)
    {
            int i = 0;
            int j = 0;
            int n = 0;
            int x = 0;
            int rank = 0;
            hid_t file_id;
            hid_t space_id;
            hid_t dset_id;
            herr_t stat;
            hsize_t *dims = NULL;
            int *data = NULL;
    
            file_id  = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
            dset_id  = H5Dopen(file_id, DS, dset_id);
    
            space_id = H5Dget_space(dset_id);
            n    = H5Sget_simple_extent_npoints(space_id);
            rank = H5Sget_simple_extent_ndims(space_id);
    
            dims = malloc(rank*sizeof(int));
            stat = H5Sget_simple_extent_dims(space_id, dims, NULL);
    
            printf("rank: %d\t dimensions: ", rank);
            for (i = 0; i < rank; ++i) {
                    if (i == 0) {
                            printf("(");
                    }
                    printf("%llu", dims[i]);
                    if (i == (rank -1)) {
                            printf(")\n");
                    } else {
                            printf(" x ");
                    }
            }
            data = malloc(n*sizeof(int));
            memset(data, 0, n*sizeof(int));
            stat  = H5Dread(dset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                             data);
    
    
            printf("%s:\n", DS);
            for (i = 0; i < dims[0]; ++i) {
                    printf(" [ ");
                    for (j = 0; j < dims[1]; ++j) {
                            x = i * dims[1] + j;
                            printf("%d ", data[x]);
                    }
                    printf("]\n");
            }
    
            stat  = H5Sclose(space_id);
            stat  = H5Dclose(dset_id);
            stat  = H5Fclose(file_id);
    
    
            return(EXIT_SUCCESS);
    }
    

    When compiled and run gives:

    ~$ h5cc -o rmat rmat.c
    ~$ ./rmat
    rank: 2  dimensions: (3 x 2)
    A/value:
     [ 1 4 ]
     [ 2 5 ]
     [ 3 6 ]
    

    This is great as it means the matrices are stored optimized in memory. What it does mean though is that you have to change how you do your calculations. For row-major you need to do pre-multiplication, while for column-major you should be doing post-multiplication. Here is an example, hopefully it is explained a bit clearer.

    Does this help?

    0 讨论(0)
提交回复
热议问题