matrix-multiplication | 易学教程

Efficient implementation of a sequence of matrix-vector products / specific “tensor”-matrix product

阅读更多关于 Efficient implementation of a sequence of matrix-vector products / specific “tensor”-matrix product

I have a special algorithm where as one of the lasts steps I need to carry out a multiplication of a 3-D array with a 2-D array such that each matrix-slice of the 3-D array is multiplied wich each column of the 2-D array. In other words, if, say A is an N x N x N matrix and B is an N x N matrix, I need to compute a matrix C of size N x N where C(:,i) = A(:,:,i)*B(:,i); . The naive way to implement this is a loop, i.e., C = zeros(N,N); for i = 1:N C(:,i) = A(:,:,i)*B(:,i); end However, loops aren't the fastest in Matlab and should be avoided. I'm looking for faster ways of doing this. Right now

Matrix multiplication using multiple threads?

阅读更多关于 Matrix multiplication using multiple threads?

I am supposed to multiply 2 matrices using threads. Two things: I keep getting 0's when I run the program. I also get message errors(for each, it says "warning: passing argument 1 of 'printMatrix' from incompatible pointer type" on the bolded lines(where I try to print the output). Also to note, the first block that is bolded, I that was my attempt at solving the problem. I think I am close, but I may not be. Can anyone help? Thanks :) Output looks like this: A= 1 4 2 5 3 6 B= 8 7 6 5 4 3 A*B= 0 0 0 0 0 0 0 0 0 #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define M 3 #define K 2

OpenCL matrix multiplication should be faster?

阅读更多关于 OpenCL matrix multiplication should be faster?

I'm trying to learn how to make GPU optimalized OpenCL kernells, I took example of matrix multiplication using square tiles in local memory. However I got at best case just ~10-times speedup ( ~50 Gflops ) in comparison to numpy.dot() ( 5 Gflops , it is using BLAS). I found studies where they got speedup >200x ( >1000 Gflops ) . ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/2012/2012-002.pdf I don't know what I'm doing wrong, or if it is just because of my GPU ( nvidia GTX 275 ). Or if it is because of some pyOpenCl overhead. But I meassured also how long does take just to copy result from GPU

why is the time complexity of square matrix multiplication defined as O(n^3)?

阅读更多关于 why is the time complexity of square matrix multiplication defined as O(n^3)?

I have come across this in multiple sources (online and books) - Running time of square matrix multiplication is O(n^3) for matrices of size nXn. (example - matrix multiplication algorithm time complexity ) This statement would indicate that the upper bound on running time of this multiplication process is C.n^3 where C is some constant and n>n0 where n0 is some input beyond which this upper bound holds true. ( http://en.wikipedia.org/wiki/Big_O_notation and What is the difference between Θ(n) and O(n)? ) Problem is, i cannot seem to derive the values of constants C and n0. My questions - Can

Numpy Vectorization of sliding-window operation

阅读更多关于 Numpy Vectorization of sliding-window operation

I have the following numpy arrays: arr_1 = [[1,2],[3,4],[5,6]] # 3 X 2 arr_2 = [[0.5,0.6],[0.7,0.8],[0.9,1.0],[1.1,1.2],[1.3,1.4]] # 5 X 2 arr_1 is clearly a 3 X 2 array, whereas arr_2 is a 5 X 2 array. Now without looping, I want to element-wise multiply arr_1 and arr_2 so that I apply a sliding window technique (window size 3) to arr_2. Example: Multiplication 1: np.multiply(arr_1,arr_2[:3,:]) Multiplication 2: np.multiply(arr_1,arr_2[1:4,:]) Multiplication 3: np.multiply(arr_1,arr_2[2:5,:]) I want to do this in some sort of a matrix multiplication form to make it faster than my current

Which is the best way to multiply a large and sparse matrix with its transpose?

阅读更多关于 Which is the best way to multiply a large and sparse matrix with its transpose?

I currently want to multiply a large sparse matrix(~1M x 200k) with its transpose. The values of the resulting matrix would be in float. I tried loading the matrix in scipy's sparse matrix and by multiplying each row of first matrix with the second matrix. The multiplication took ~2hrs to complete. What is the efficient way to achieve this multiplication? Because I see a pattern in the computation. The matrix being large and sparse . The multiplication of a matrix with its transpose. So, the resulting matrix would be symmetric. I would like to know what libraries can achieve the computation

Matrix-vector-multiplication in AVX not proportionately faster than in SSE

阅读更多关于 Matrix-vector-multiplication in AVX not proportionately faster than in SSE

I was writing a matrix-vector-multiplication in both SSE and AVX using the following: for(size_t i=0;i<M;i++) { size_t index = i*N; __m128 a, x, r1; __m128 sum = _mm_setzero_ps(); for(size_t j=0;j<N;j+=4,index+=4) { a = _mm_load_ps(&A[index]); x = _mm_load_ps(&X[j]); r1 = _mm_mul_ps(a,x); sum = _mm_add_ps(r1,sum); } sum = _mm_hadd_ps(sum,sum); sum = _mm_hadd_ps(sum,sum); _mm_store_ss(&C[i],sum); } I used a similar method for AVX, however at the end, since AVX doesn't have an equivalent instruction to _mm_store_ss() , I used: _mm_store_ss(&C[i],_mm256_castps256_ps128(sum)); The SSE code gives

parallelizing matrix multiplication through threading and SIMD

阅读更多关于 parallelizing matrix multiplication through threading and SIMD

I am trying to speed up matrix multiplication on multicore architecture. For this end, I try to use threads and SIMD at the same time. But my results are not good. I test speed up over sequential matrix multiplication: void sequentialMatMul(void* params) { cout << "SequentialMatMul started."; int i, j, k; for (i = 0; i < N; i++) { for (k = 0; k < N; k++) { for (j = 0; j < N; j++) { X[i][j] += A[i][k] * B[k][j]; } } } cout << "\nSequentialMatMul finished."; } I tried to add threading and SIMD to matrix multiplication as follows: void threadedSIMDMatMul(void* params) { bounds *args = (bounds*

Why is the performance of these matrix multiplications so different?

阅读更多关于 Why is the performance of these matrix multiplications so different?

I wrote two matrix classes in Java just to compare the performance of their matrix multiplications. One class (Mat1) stores a double[][] A member where row i of the matrix is A[i] . The other class (Mat2) stores A and T where T is the transpose of A . Let's say we have a square matrix M and we want the product of M.mult(M) . Call the product P . When M is a Mat1 instance the algorithm used was the straightforward one: P[i][j] += M.A[i][k] * M.A[k][j] for k in range(0, M.A.length) In the case where M is a Mat2 I used: P[i][j] += M.A[i][k] * M.T[j][k] which is the same algorithm because T[j][k]=

Why did matrix multiplication using python's numpy become so slow after upgrading ubuntu from 12.04 to 14.04?

阅读更多关于 Why did matrix multiplication using python's numpy become so slow after upgrading ubuntu from 12.04 to 14.04?

问题 I used to have Ubuntu 12.04 and recently did a fresh installation of Ubuntu 14.04. The stuff I'm working on involves multiplications of big matrices (~2000 X 2000), for which I'm using numpy. The problem I'm having is that now the calculations are taking 10-15 times longer. Going from Ubuntu 12.04 to 14.04 implied going from Python 2.7.3 to 2.7.6 and from numpy 1.6.1 to 1.8.1. However, I think that the issue might have to do with the linear algebra libraries that numpy is linked to. Instead

订阅 matrix-multiplication