matrix-multiplication

Efficient implementation of a sequence of matrix-vector products / specific “tensor”-matrix product

你。 提交于 2019-12-05 17:42:50
I have a special algorithm where as one of the lasts steps I need to carry out a multiplication of a 3-D array with a 2-D array such that each matrix-slice of the 3-D array is multiplied wich each column of the 2-D array. In other words, if, say A is an N x N x N matrix and B is an N x N matrix, I need to compute a matrix C of size N x N where C(:,i) = A(:,:,i)*B(:,i); . The naive way to implement this is a loop, i.e., C = zeros(N,N); for i = 1:N C(:,i) = A(:,:,i)*B(:,i); end However, loops aren't the fastest in Matlab and should be avoided. I'm looking for faster ways of doing this. Right now

Matrix multiplication using multiple threads?

一个人想着一个人 提交于 2019-12-05 14:29:30
I am supposed to multiply 2 matrices using threads. Two things: I keep getting 0's when I run the program. I also get message errors(for each, it says "warning: passing argument 1 of 'printMatrix' from incompatible pointer type" on the bolded lines(where I try to print the output). Also to note, the first block that is bolded, I that was my attempt at solving the problem. I think I am close, but I may not be. Can anyone help? Thanks :) Output looks like this: A= 1 4 2 5 3 6 B= 8 7 6 5 4 3 A*B= 0 0 0 0 0 0 0 0 0 #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define M 3 #define K 2

OpenCL matrix multiplication should be faster?

安稳与你 提交于 2019-12-05 14:14:17
I'm trying to learn how to make GPU optimalized OpenCL kernells, I took example of matrix multiplication using square tiles in local memory. However I got at best case just ~10-times speedup ( ~50 Gflops ) in comparison to numpy.dot() ( 5 Gflops , it is using BLAS). I found studies where they got speedup >200x ( >1000 Gflops ) . ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/2012/2012-002.pdf I don't know what I'm doing wrong, or if it is just because of my GPU ( nvidia GTX 275 ). Or if it is because of some pyOpenCl overhead. But I meassured also how long does take just to copy result from GPU

why is the time complexity of square matrix multiplication defined as O(n^3)?

∥☆過路亽.° 提交于 2019-12-05 10:58:36
I have come across this in multiple sources (online and books) - Running time of square matrix multiplication is O(n^3) for matrices of size nXn. (example - matrix multiplication algorithm time complexity ) This statement would indicate that the upper bound on running time of this multiplication process is C.n^3 where C is some constant and n>n0 where n0 is some input beyond which this upper bound holds true. ( http://en.wikipedia.org/wiki/Big_O_notation and What is the difference between Θ(n) and O(n)? ) Problem is, i cannot seem to derive the values of constants C and n0. My questions - Can

Numpy Vectorization of sliding-window operation

佐手、 提交于 2019-12-05 09:49:54
I have the following numpy arrays: arr_1 = [[1,2],[3,4],[5,6]] # 3 X 2 arr_2 = [[0.5,0.6],[0.7,0.8],[0.9,1.0],[1.1,1.2],[1.3,1.4]] # 5 X 2 arr_1 is clearly a 3 X 2 array, whereas arr_2 is a 5 X 2 array. Now without looping, I want to element-wise multiply arr_1 and arr_2 so that I apply a sliding window technique (window size 3) to arr_2. Example: Multiplication 1: np.multiply(arr_1,arr_2[:3,:]) Multiplication 2: np.multiply(arr_1,arr_2[1:4,:]) Multiplication 3: np.multiply(arr_1,arr_2[2:5,:]) I want to do this in some sort of a matrix multiplication form to make it faster than my current

Which is the best way to multiply a large and sparse matrix with its transpose?

蓝咒 提交于 2019-12-05 07:12:24
I currently want to multiply a large sparse matrix(~1M x 200k) with its transpose. The values of the resulting matrix would be in float. I tried loading the matrix in scipy's sparse matrix and by multiplying each row of first matrix with the second matrix. The multiplication took ~2hrs to complete. What is the efficient way to achieve this multiplication? Because I see a pattern in the computation. The matrix being large and sparse . The multiplication of a matrix with its transpose. So, the resulting matrix would be symmetric. I would like to know what libraries can achieve the computation

Matrix-vector-multiplication in AVX not proportionately faster than in SSE

一世执手 提交于 2019-12-05 04:46:04
I was writing a matrix-vector-multiplication in both SSE and AVX using the following: for(size_t i=0;i<M;i++) { size_t index = i*N; __m128 a, x, r1; __m128 sum = _mm_setzero_ps(); for(size_t j=0;j<N;j+=4,index+=4) { a = _mm_load_ps(&A[index]); x = _mm_load_ps(&X[j]); r1 = _mm_mul_ps(a,x); sum = _mm_add_ps(r1,sum); } sum = _mm_hadd_ps(sum,sum); sum = _mm_hadd_ps(sum,sum); _mm_store_ss(&C[i],sum); } I used a similar method for AVX, however at the end, since AVX doesn't have an equivalent instruction to _mm_store_ss() , I used: _mm_store_ss(&C[i],_mm256_castps256_ps128(sum)); The SSE code gives

parallelizing matrix multiplication through threading and SIMD

删除回忆录丶 提交于 2019-12-05 02:16:53
I am trying to speed up matrix multiplication on multicore architecture. For this end, I try to use threads and SIMD at the same time. But my results are not good. I test speed up over sequential matrix multiplication: void sequentialMatMul(void* params) { cout << "SequentialMatMul started."; int i, j, k; for (i = 0; i < N; i++) { for (k = 0; k < N; k++) { for (j = 0; j < N; j++) { X[i][j] += A[i][k] * B[k][j]; } } } cout << "\nSequentialMatMul finished."; } I tried to add threading and SIMD to matrix multiplication as follows: void threadedSIMDMatMul(void* params) { bounds *args = (bounds*

Why is the performance of these matrix multiplications so different?

女生的网名这么多〃 提交于 2019-12-05 02:06:49
I wrote two matrix classes in Java just to compare the performance of their matrix multiplications. One class (Mat1) stores a double[][] A member where row i of the matrix is A[i] . The other class (Mat2) stores A and T where T is the transpose of A . Let's say we have a square matrix M and we want the product of M.mult(M) . Call the product P . When M is a Mat1 instance the algorithm used was the straightforward one: P[i][j] += M.A[i][k] * M.A[k][j] for k in range(0, M.A.length) In the case where M is a Mat2 I used: P[i][j] += M.A[i][k] * M.T[j][k] which is the same algorithm because T[j][k]=

Why did matrix multiplication using python's numpy become so slow after upgrading ubuntu from 12.04 to 14.04?

百般思念 提交于 2019-12-05 00:53:05
问题 I used to have Ubuntu 12.04 and recently did a fresh installation of Ubuntu 14.04. The stuff I'm working on involves multiplications of big matrices (~2000 X 2000), for which I'm using numpy. The problem I'm having is that now the calculations are taking 10-15 times longer. Going from Ubuntu 12.04 to 14.04 implied going from Python 2.7.3 to 2.7.6 and from numpy 1.6.1 to 1.8.1. However, I think that the issue might have to do with the linear algebra libraries that numpy is linked to. Instead