matrix-multiplication

How to get faster code than numpy.dot for matrix multiplication?

时间秒杀一切 提交于 2019-11-27 12:28:15
Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And maybe there is some faster function for matrix multiplication in python, because I still use numpy.dot for small block matrix multiplication. here is some code: Assume matrices can fit in RAM: test on matrix 10*1000 x 1000. Using default numpy(I think no BLAS lib). Plain numpy arrays are in RAM: time 9.48 If A,B in RAM, C on disk: time 1.48 If A,B,C on

Why is matrix multiplication faster with numpy than with ctypes in Python?

五迷三道 提交于 2019-11-27 11:32:59
I was trying to figure out the fastest way to do matrix multiplication and tried 3 different ways: Pure python implementation: no surprises here. Numpy implementation using numpy.dot(a, b) Interfacing with C using ctypes module in Python. This is the C code that is transformed into a shared library: #include <stdio.h> #include <stdlib.h> void matmult(float* a, float* b, float* c, int n) { int i = 0; int j = 0; int k = 0; /*float* c = malloc(nay * sizeof(float));*/ for (i = 0; i < n; i++) { for (j = 0; j < n; j++) { int sub = 0; for (k = 0; k < n; k++) { sub = sub + a[i * n + k] * b[k * n + j];

Efficient 4x4 matrix multiplication (C vs assembly)

寵の児 提交于 2019-11-27 11:30:18
I'm looking for a faster and trickier way to multiply two 4x4 matrices in C. My current research is focused on x86-64 assembly with SIMD extensions. So far, I've created a function witch is about 6x faster than a naive C implementation, which has exceeded my expectations for the performance improvement. Unfortunately, this stays true only when no optimization flags are used for compilation (GCC 4.7). With -O2 , C becomes faster and my effort becomes meaningless. I know that modern compilers make use of complex optimization techniques to achieve an almost perfect code, usually faster than an

CUDA determining threads per block, blocks per grid

空扰寡人 提交于 2019-11-27 10:33:49
I'm new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I've found is that many examples have seemingly arbitrary number chosen for these things. I'm considering a problem where I would be able to pass matrices - of any size - to a method for multiplication. So that, each element of C (as in C = A * B) would be calculated by a single thread. How would you determine the threads/block, blocks/grid in this case? In general you want to size your blocks/grid to match your data and

Vectorized way of calculating row-wise dot product two matrices with Scipy

ε祈祈猫儿з 提交于 2019-11-27 07:48:40
I want to calculate the row-wise dot product of two matrices of the same dimension as fast as possible. This is the way I am doing it: import numpy as np a = np.array([[1,2,3], [3,4,5]]) b = np.array([[1,2,3], [1,2,3]]) result = np.array([]) for row1, row2 in a, b: result = np.append(result, np.dot(row1, row2)) print result and of course the output is: [ 26. 14.] Check out numpy.einsum for another method: In [52]: a Out[52]: array([[1, 2, 3], [3, 4, 5]]) In [53]: b Out[53]: array([[1, 2, 3], [1, 2, 3]]) In [54]: einsum('ij,ij->i', a, b) Out[54]: array([14, 26]) Looks like einsum is a bit

How to get element-wise matrix multiplication (Hadamard product) in numpy?

会有一股神秘感。 提交于 2019-11-27 07:43:42
I have two matrices a = np.matrix([[1,2], [3,4]]) b = np.matrix([[5,6], [7,8]]) and I want to get the element-wise product, [[1*5,2*6], [3*7,4*8]] , equaling [[5,12], [21,32]] I have tried print(np.dot(a,b)) and print(a*b) but both give the result [[19 22], [43 50]] which is the matrix product, not the element-wise product. How can I get the the element-wise product (aka Hadamard product) using built-in functions? Rahul K P For elementwise multiplication of matrix objects, you can use numpy.multiply : import numpy as np a = np.array([[1,2],[3,4]]) b = np.array([[5,6],[7,8]]) np.multiply(a,b)

how to optimize matrix multiplication (matmul) code to run fast on a single processor core

南楼画角 提交于 2019-11-27 05:11:56
I am working on parallel programming concepts and trying to optimize matrix multiplication example on single core. The fastest implementation I came up so far is the following: /* This routine performs a dgemm operation * C := C + A * B * where A, B, and C are lda-by-lda matrices stored in column-major format. * On exit, A and B maintain their input values. */ void square_dgemm (int n, double* A, double* B, double* C) { /* For each row i of A */ for (int i = 0; i < n; ++i) /* For each column j of B */ for (int j = 0; j < n; ++j) { /* Compute C(i,j) */ double cij = C[i+j*n]; for( int k = 0; k <

Large (0,1) matrix multiplication using bitwise AND and popcount instead of actual int or float multiplies?

一曲冷凌霜 提交于 2019-11-27 04:42:33
问题 For multiplying large binary matrices (10Kx20K), what I usually to do is to convert the matrices to float ones and perform float matrix multiplication as integer matrix multiplication is pretty slow (have a look at here). This time though, I'd need to perform over hundred thousands of these multiplications and even a millisecond performance improvement on average matters to me . I want an int or float matrix as a result, because the product may have elements that aren't 0 or 1. The input

Multiply 2 matrices in Javascript

左心房为你撑大大i 提交于 2019-11-27 04:30:59
问题 I'm doing a function that multiplies 2 matrices. The matrices will always have the same number of rows and columns. (2x2, 5x5, 23x23, ...) When I print it, it doesn't work. Why? For example, if I create two 2x2 matrices: matrixA: [1][2] [3][4] matrixB: [5][6] [7][8] The result should be: [19][22] [43][50] (http://ncalculators.com/matrix/2x2-matrix-multiplication-calculator.htm) But, I get: [19][undefined] [22][indefined] function multiplyMatrix(matrixA, matrixB) { var result = new Array();/

MATLAB: How to vector-multiply two arrays of matrices?

戏子无情 提交于 2019-11-27 03:26:59
问题 I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take A = repmat([1,2; 3,4], [1 1 4]); (but assume A(:,:,j) is different for each j ). How can one easily perform a per- j matrix multiplication of two such matrix-arrays A and B ? C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower for jj = 1:size(A, 3) C(:,:,jj) = A(:,:,jj) * B(:,:,jj); end certainly does the job,