Is it possible to know if a matrix stored in HDF5 format is in RowMajor or ColMajor? For example when I save matrices from octave, which stores them internally as ColMajor,
HDF5 stores data in row major order:
HDF5 uses C storage conventions, assuming that the last listed dimension is the fastest-changing dimension and the first-listed dimension is the slowest changing.
from the HDF5 User's Guide.
However, if you're using Octave's built-in HDF5 interface, it will automatically transpose the arrays for you. In general, how the data is actually written in the HDF5 file should be completely opaque to the end-user, and the interface should deal with differences in array ordering, etc.
As @Yossarian pointed out. HDF5 always stores data as row-major (C convention). Octave is the same as Fortran and internally stores data as column-major.
When writing a matrix from Octave, the HDF5 layer does the transpose for you, so it is always written as row-major no matter what language you use. This provides the portability of the file.
There is a very good example in the HDF5 User's Guide section 7.3.2.5, as mentioned by @Yossarian. Here's the example (almost) reproduced using Octave:
octave:1> A = [ 1:3; 4:6 ]
A =
1 2 3
4 5 6
octave:2> save("-hdf5", "test.h5", "A")
octave:3> quit
~$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
COMMENT "# Created by Octave 3.6.4, Fri Jun 13 08:36:16 2014 MDT <user@localhost>"
GROUP "A" {
ATTRIBUTE "OCTAVE_NEW_FORMAT" {
DATATYPE H5T_STD_U8LE
DATASPACE SCALAR
DATA {
(0): 1
}
}
DATASET "type" {
DATATYPE H5T_STRING {
STRSIZE 7;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "matrix"
}
}
DATASET "value" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 3, 2 ) / ( 3, 2 ) }
DATA {
(0,0): 1, 4,
(1,0): 2, 5,
(2,0): 3, 6
}
}
}
}
}
Notice how the HDF5 layer has transposed the matrix to make sure it is stored in row-major format.
Then an example of reading it in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <hdf5.h>
#define FILE "test.h5"
#define DS "A/value"
int
main(int argc, char **argv)
{
int i = 0;
int j = 0;
int n = 0;
int x = 0;
int rank = 0;
hid_t file_id;
hid_t space_id;
hid_t dset_id;
herr_t stat;
hsize_t *dims = NULL;
int *data = NULL;
file_id = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
dset_id = H5Dopen(file_id, DS, dset_id);
space_id = H5Dget_space(dset_id);
n = H5Sget_simple_extent_npoints(space_id);
rank = H5Sget_simple_extent_ndims(space_id);
dims = malloc(rank*sizeof(int));
stat = H5Sget_simple_extent_dims(space_id, dims, NULL);
printf("rank: %d\t dimensions: ", rank);
for (i = 0; i < rank; ++i) {
if (i == 0) {
printf("(");
}
printf("%llu", dims[i]);
if (i == (rank -1)) {
printf(")\n");
} else {
printf(" x ");
}
}
data = malloc(n*sizeof(int));
memset(data, 0, n*sizeof(int));
stat = H5Dread(dset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
data);
printf("%s:\n", DS);
for (i = 0; i < dims[0]; ++i) {
printf(" [ ");
for (j = 0; j < dims[1]; ++j) {
x = i * dims[1] + j;
printf("%d ", data[x]);
}
printf("]\n");
}
stat = H5Sclose(space_id);
stat = H5Dclose(dset_id);
stat = H5Fclose(file_id);
return(EXIT_SUCCESS);
}
When compiled and run gives:
~$ h5cc -o rmat rmat.c
~$ ./rmat
rank: 2 dimensions: (3 x 2)
A/value:
[ 1 4 ]
[ 2 5 ]
[ 3 6 ]
This is great as it means the matrices are stored optimized in memory. What it does mean though is that you have to change how you do your calculations. For row-major you need to do pre-multiplication, while for column-major you should be doing post-multiplication. Here is an example, hopefully it is explained a bit clearer.
Does this help?