Non Square Matrix Multiplication in CUDA
问题 The code I use for matrix multiplications in CUDA lets me multiply both square and non square matrices, however, both Width and Height MUST be multiples of blocksize. So, for example, I can multiply [3][6] * [6][3] (using blocksize=3), but I can't multiply [3][2]*[2][3]. Does anyone knows a way to do that? This is my kernel: #include <stdio.h> #include <limits.h> #include <stdlib.h> #define blocksize 3 #define HM (1*blocksize) #define WM (2*blocksize) #define WN (1*blocksize) #define HN WM