BLAS LDB using DGEMM | 易学教程

问题

I want to multiply matrices by D*W', where W' is a transposed version of W.

While I'll use DGEMM I'l figured out with the help of @IanBush that the LDB in this case should be number of rows of matrix W instead of number of columns. The code for this case is

  Call dgemm('n', 't', N1, M1, N1, 1.0_wp, D, N1, W, M1, 0.0_wp, c, n1)

where n1 and m1 are dimensions of my matrices

Dimensions of the matrices:
W = M1*N1
D = N1*N1

As in official documentation it says

      LDB is INTEGER
       On entry, LDB specifies the first dimension of B as declared
       in the calling (sub) program. When  TRANSB = 'N' or 'n' then
       LDB must be at least  max( 1, k ), otherwise  LDB must be at
       least  max( 1, n ).

Why does this happen?

回答1:

The call to dgemm contains two sets of integers. The first is to define the dimensions of the matrices as described by the maths. Looking at the documentation for BLAS at http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html we can see the arguments to the routine defined as (and formatting it in a way that helps me remember how it works):

         Subroutine dgemm (TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, &
                                                           B, LDB, & 
                                                     BETA, C, LDC)

The set of integers to define the mathematical dimensions is M, N, K In this set

M is the first mathematical dimension of the result matrix, C
N is the second mathematical dimension of the result matrix C
K is the length of the dot product required to produce an individual element of the result matrix C

So M, N, K are purely to do with the maths of the problem, nothing else. But we are on a computer, so there is a second issue - how is each of the two dimensional matrices laid out in the computer's memory, which is an inherently one dimensional object? Now Fortran stores its objects in Column Major order. This means that in memory if you have element A( i, j ) the next memory location will hold A( i + 1, j ) if we are not an the end of a column, or A( 1, j + 1 ) if we are. So the matrix is laid out with the 'first index moving fastest'. To define this layout of a 2 dimensional object all we need to tell the program is how many elements of A you need to jump over to go from A( i, j ) to A( i, j + 1 ) - and it is this number which is the 'leading dimension', LDA, LDB, LDC in the documentation. And thus it is simply the number of rows in the matrix as declared or allocated.

Now let's look at the documentation you quote

  LDB is INTEGER
   On entry, LDB specifies the first dimension of B as declared
   in the calling (sub) program. When  TRANSB = 'N' or 'n' then
   LDB must be at least  max( 1, k ), otherwise  LDB must be at
   least  max( 1, n ).

LDB is the number of rows in the matrix B is it is declared. Thus

If B is NOT transposed each column of B will be involved in a dot product with a row or column of A, depending on the transpose option for A. The mathematical dimensions say the dot product is k long. Thus the number of rows in B must be at least k to allow the maths to be right, and so LDB must be at least k
If B IS transposed the mathematical first dimension of B will also be the mathematical second dimension of the result matrix, and this is given by N. So LDB in this case must be at least N

So that answers most of it, and if you are always multiplying all of one matrix as declared by all of a second matrix the leading dimension will simply be the dirst dimension of each matrix as declared. But the 'at least' means you can multiply bits of arrays together. Consider the following, and by carefully separating those integers which define the maths, and those which define the memory layout, can you work out why it does what it does? Note that a( 2, 2 ) in the argument list means we "start" the matrix at this element - now think carefully what the leading dimension tells you and you should be able to sort out how this works.

ian@eris:~/work/stack$ cat matmul2.f90
Program sirt

  Integer, Parameter :: wp = Kind( 1.0d0 )

  Real( wp ), Dimension( 1:5, 1:5 ) :: A
  Real( wp ), Dimension( 1:3, 1:3 ) :: B
  Real( wp ), Dimension( 1:4, 1:4 ) :: C

  Integer :: i

  A = 0.0_wp
  Do i = 1, 5
     A( i, i ) = Real( i, wp )
  End Do
  Write( *, * ) 'A = '
  Do i = 1, Size( A, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) A( i, : )
  End Do

  B = 0.0_wp
  B( 1, 1 ) = 1.0_wp
  B( 3, 2 ) = 1.0_wp
  B( 2, 3 ) = 1.0_wp
  Write( *, * ) 'B = '
  Do i = 1, Size( B, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) B( i, : )
  End Do

  ! Lazy - should really just initialise only the bits of C that are NOT touched
  ! by the dgemm
  C = 0.0_wp

  Call dgemm('N', 'N', 3, 3, 3, 1.0_wp, A( 2, 2 ), Size( A, Dim = 1 ), &
                                        B        , Size( B, Dim = 1 ), &
                                0.0_wp, C        , Size( C, Dim = 1 ) )

  Write( *, * ) 'C after dgemm'
  Write( *, * ) 'B = '
  Do i = 1, Size( C, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) C( i, : )
  End Do


End Program sirt
ian@eris:~/work/stack$ gfortran-8  -std=f2008 -fcheck=all -Wall -Wextra -pedantic -O matmul2.f90 -lblas
/usr/bin/ld: warning: libgfortran.so.4, needed by //usr/lib/x86_64-linux-gnu/libopenblas.so.0, may conflict with libgfortran.so.5
ian@eris:~/work/stack$ ./a.out
 A = 
 1.0  0.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0  0.0
 0.0  0.0  3.0  0.0  0.0
 0.0  0.0  0.0  4.0  0.0
 0.0  0.0  0.0  0.0  5.0
 B = 
 1.0  0.0  0.0
 0.0  0.0  1.0
 0.0  1.0  0.0
 C after dgemm
 B = 
 2.0  0.0  0.0  0.0
 0.0  0.0  3.0  0.0
 0.0  4.0  0.0  0.0
 0.0  0.0  0.0  0.0

This use for multiplying a part of two larger matrices together is surprisingly common, especially in blocked linear algebra algorithms, and the separation of the mathematical and layout dimensions allows you to do it

回答2:

Imagine you have a large matrix B that looks like

b b b b B B B
b b b b B B B
b b b b B B B
B B B B B B B
B B B B B B B
B B B B B B B

B is a 6x7 matrix. However in your program, you don't use B as a matrix, but just as some storage location. The real matrix you are interested in is b which is located in the first parts of B. Now since you stored it in that two dimensional array representing B, there is a form of memory layout here. The matrix b is not consequitive in memory because the matrix B is. In memory, B looks like:

b b b B B B b b b B B B b b b B B B b b b B B B B B B B B B B B B B B B B B B B B B 
  col 1    |  col 2    |  col 3    |  col 4    |  col 5    |  col 6    |  col 7

If you want to pass the matrix b to LAPACK or BLAS, you need to inform it about the memory layout. This is done using the size N and M but also the leading dimension LDA. The leading dimension LDA is the total amount of rows of the larger matrix it is stored in. So, here you would pass N=3, M=4 and LDA=6. With this, BLAS and LAPACK know which parts it has to skip of the memory-block.

来源：https://stackoverflow.com/questions/60946566/blas-ldb-using-dgemm

标签

matrix

fortran

hpc

lapack

blas