matrix-multiplication | 易学教程

OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

阅读更多关于 OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

问题 I'm writing a program for matrix multiplication with OpenMP, that, for cache convenience, implements the multiplication A x B(transpose) rows X rows instead of the classic A x B rows x columns, for better cache efficiency. Doing this I faced an interesting fact that for me is illogic: if in this code i parallelize the extern loop the program is slower than if I put the OpenMP directives in the most inner loop, in my computer the times are 10.9 vs 8.1 seconds. //A and B are double* allocated

Matrix multiplication: Strassen vs. Standard

阅读更多关于 Matrix multiplication: Strassen vs. Standard

问题 I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn't that, what I expected. As you can see strassen always takes more time then standard implementation and only with a dimension from a power of 2 is as fast as standard implementation. What went wrong? matrix mult_strassen(matrix a, matrix b) { if (a.dim() <= cut) return mult_std(a, b); matrix a11 = get_part(0, 0, a); matrix a12 = get_part(0, 1, a); matrix a21 = get_part(1, 0, a); matrix a22 =

Improving Performance of Multiplication of Scipy Sparse Matrices

阅读更多关于 Improving Performance of Multiplication of Scipy Sparse Matrices

问题 Given a Scipy CSC Sparse matrix "sm" with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector "v" (170k x 1) with a few non-null points, is there anything that can be done to improve the performance of the operation: resul = sm.dot(v) ? Currently it's taking roughly 1 second. Initializing the matrices as CSR increased the time up to 3 seconds, so CSC performed better. SM is a matrix of similarities between products and V is the vector that represents which

Store triangular matrix efficiently

阅读更多关于 Store triangular matrix efficiently

I need to efficiently store a lower triangular matrix by not storing all the zeroes in the memory, so I have thought about it this way: first I allocate memory for every row, then for each row I allocate i+1 bytes, so I never have to worry about the zeroes, but something is wrong at the first allocation. What am I doing wrong? This is my code, and the compiler exits the program at line 8, just after reading the dimension of the matrix. #include <stdio.h> #include <stdlib.h> int main () { int i, j, **mat1, dim; scanf("%d",&dim); *mat1 = (int**)calloc(dim, sizeof(int*)); for(i = 0; i<dim; i++)

matrix multiplication algorithm time complexity

阅读更多关于 matrix multiplication algorithm time complexity

问题 I came up with this algorithm for matrix multiplication. I read somewhere that matrix multiplication has a time complexity of o(n^2). But I think my this algorithm will give o(n^3). I don't know how to calculate time complexity of nested loops. So please correct me. for i=1 to n for j=1 to n c[i][j]=0 for k=1 to n c[i][j] = c[i][j]+a[i][k]*b[k][j] 回答1: The naive algorithm, which is what you've got once you correct it as noted in comments, is O(n^3). There do exist algorithms that reduce this

How to perform Vector-Matrix Multiplication with BLAS ?

阅读更多关于 How to perform Vector-Matrix Multiplication with BLAS ?

BLAS defines the GEMV (Matrix-Vector Multiplication) level-2 operation. How to use a BLAS Library to perform Vector-Matrix Multiplication ? It's probably obvious, but I don't see how to use BLAS operation for this multiplication. I would have expected a GEVM operation. The Matrix-Vector multiplication of a (M x N) Matrix with a (N x 1) Vector will result an (M x 1) Vector. In short a*A(MxN)*X(Nx1) + b*Y(Mx1) -> Y(Mx1) . Of course you can use INCX and INCY when your vector is included in a matrix. In order to define a Vector-Matrix multiplication The Vector should be transposed. i.e. a*X(1xM)*A

Matrix multiplication: Strassen vs. Standard

阅读更多关于 Matrix multiplication: Strassen vs. Standard

I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn't that, what I expected. As you can see strassen always takes more time then standard implementation and only with a dimension from a power of 2 is as fast as standard implementation. What went wrong? matrix mult_strassen(matrix a, matrix b) { if (a.dim() <= cut) return mult_std(a, b); matrix a11 = get_part(0, 0, a); matrix a12 = get_part(0, 1, a); matrix a21 = get_part(1, 0, a); matrix a22 = get_part(1, 1, a); matrix b11 = get_part(0, 0, b); matrix b12 = get_part(0, 1, b); matrix b21 = get_part

Fastest way to calculate minimum euclidean distance between two matrices containing high dimensional vectors

阅读更多关于 Fastest way to calculate minimum euclidean distance between two matrices containing high dimensional vectors

I started a similar question on another thread , but then I was focusing on how to use OpenCV. Having failed to achieve what I originally wanted, I will ask here exactly what I want. I have two matrices. Matrix a is 2782x128 and Matrix b is 4000x128, both unsigned char values. The values are stored in a single array. For each vector in a, I need the index of the vector in b with the closest euclidean distance. Ok, now my code to achieve this: #include <windows.h> #include <stdlib.h> #include <stdio.h> #include <cstdio> #include <math.h> #include <time.h> #include <sys/timeb.h> #include

OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

阅读更多关于 OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

I'm writing a program for matrix multiplication with OpenMP, that, for cache convenience, implements the multiplication A x B(transpose) rows X rows instead of the classic A x B rows x columns, for better cache efficiency. Doing this I faced an interesting fact that for me is illogic: if in this code i parallelize the extern loop the program is slower than if I put the OpenMP directives in the most inner loop, in my computer the times are 10.9 vs 8.1 seconds. //A and B are double* allocated with malloc, Nu is the lenght of the matrixes //which are square //#pragma omp parallel for for (i=0; i

matrix multiplication algorithm time complexity

阅读更多关于 matrix multiplication algorithm time complexity

I came up with this algorithm for matrix multiplication. I read somewhere that matrix multiplication has a time complexity of o(n^2). But I think my this algorithm will give o(n^3). I don't know how to calculate time complexity of nested loops. So please correct me. for i=1 to n for j=1 to n c[i][j]=0 for k=1 to n c[i][j] = c[i][j]+a[i][k]*b[k][j] The naive algorithm, which is what you've got once you correct it as noted in comments, is O(n^3). There do exist algorithms that reduce this somewhat, but you're not likely to find an O(n^2) implementation. I believe the question of the most