matrix-multiplication

Dot product two 4D Numpy array

老子叫甜甜 提交于 2021-02-18 17:51:07
问题 I have a 4D Numpy array of shape (15, 2, 320, 320). In other words, each element of the [320 x 320] matrix is a matrix of size [15 x 2]. Now, I would like to compute the dot product for each element of the [320x320] matrix, then extract the diagonal array. Currently, I am using 2 "for" loops, and the code works well (see the snippet). However, the calculation speed is too slow (when I process a large data). Anyone can show me how to vectorize the computation without the loops. A = np.random

Sparse matrix-vector multiplication in CUDA

十年热恋 提交于 2021-02-17 20:26:04
问题 I'm trying to implement a matrix-vector Multiplication on GPU (using CUDA). In my C++ code (CPU), I load the matrix as a dense matrix, and then I perform the matrix-vector multiplication using CUDA. I'm also using shared memory to improve the performance. How can I load the matrix in an efficient way, knowing that my matrix is a sparse matrix? Below is my C++ function to load the matrix: int readMatrix( char* filename, float* &matrix, unsigned int *dim = NULL, int majority = ROW_MAJOR ) {

Sparse matrix-vector multiplication in CUDA

我的梦境 提交于 2021-02-17 20:22:23
问题 I'm trying to implement a matrix-vector Multiplication on GPU (using CUDA). In my C++ code (CPU), I load the matrix as a dense matrix, and then I perform the matrix-vector multiplication using CUDA. I'm also using shared memory to improve the performance. How can I load the matrix in an efficient way, knowing that my matrix is a sparse matrix? Below is my C++ function to load the matrix: int readMatrix( char* filename, float* &matrix, unsigned int *dim = NULL, int majority = ROW_MAJOR ) {

Optimizing Tensorflow for many small matrix-vector multiplications

本小妞迷上赌 提交于 2021-02-08 08:52:09
问题 To build up a capsule network training script, I need to compute many small matrix-vector multiplications. The size of each weight matrix is at most 20 by 20. The number of weight matrices is more more than 900. I'm curious tf.matmul or tf.linalg.matvec is the best option for this. Could anybody give me a hint to optimize the training script? 回答1: EDIT: Looking at the notebook that you are referring to, it seems you have the following parameters: batch_size = 50 caps1_n_caps = 1152 caps1_n

Lower triangular matrix-vector product

末鹿安然 提交于 2021-02-08 06:59:24
问题 For a programming exercise, I was given the lower triangular elements of a symmetric 3x3 matrix saved as an array |1 * *| |2 4 *| => [1,2,3,4,5,6] |3 5 6| I need to make the product C(i)=C(i)+M(i,j)V(j) where M is the symmetric matrix and V is a vector. V =>[A,B,C] C(1)=1*A + 2*B + 3*C C(2)=2*A + 4*B + 5*C C(3)=3*A + 5*B + 6*C I am trying to make an efficient algorithm that can perform this product I can easily generate all the product I need for C(3) However, I have a problem when I try to

Parallelized Matrix Multiplication

安稳与你 提交于 2021-02-08 06:19:15
问题 I am trying to parallelize the multiplication of two matrix A , B . Unfortunately the serial implementation is still faster than the parallel one or the speedup is too low. (with matrix dimension = 512 the speedup is like 1.3 ). Probably something is fundamentally wrong. Can someone out there give me a tip? double[][] matParallel2(final double[][] matrixA, final double[][] matrixB, final boolean parallel) { int rows = matrixA.length; int columnsA = matrixA[0].length; int columnsB = matrixB[0]

How can I find out if A * B is a Hadamard or Dot Product in Numpy?

天涯浪子 提交于 2021-02-07 10:08:33
问题 If I see the following line in a python code where numpy is imported: c = a * b What is the easiest and most practical way to determine if this operation is executed as a Hadamard (elementwise) or dot product (pointwise) operation? Is it right that for a Hadamard product the column and row size of A and B must be the same. For a dot product only the column size of A must be the same as the row size of B, correct? So I can lookup the shape of both and find out which operation is used? 回答1:

Matrix-Vector and Matrix-Matrix multiplication using SSE

懵懂的女人 提交于 2021-02-07 04:28:19
问题 I need to write matrix-vector and matrix-matrix multiplication functions but I cannot wrap my head around SSE commands. The dimensions of matrices and vectors are always multiples of 4. I managed to write the vector-vector multiplication function that looks like this: void vector_multiplication_SSE(float* m, float* n, float* result, unsigned const int size) { int i; __declspec(align(16))__m128 *p_m = (__m128*)m; __declspec(align(16))__m128 *p_n = (__m128*)n; __declspec(align(16))__m128 *p

Wondering why scipy.spatial.distance.sqeuclidean is twice slower than numpy.sum((y1-y2)**2)

倖福魔咒の 提交于 2021-02-05 08:26:28
问题 Here is my code import numpy as np import time from scipy.spatial import distance y1=np.array([0,0,0,0,1,0,0,0,0,0]) y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ]) start_time = time.time() for i in range(1000000): distance.sqeuclidean(y1,y2) print("--- %s seconds ---" % (time.time() - start_time)) ---15.212640523910522 seconds--- start_time = time.time() for i in range(1000000): np.sum((y1-y2)**2) print("--- %s seconds ---" % (time.time() - start_time)) ---8.381187438964844--

Wondering why scipy.spatial.distance.sqeuclidean is twice slower than numpy.sum((y1-y2)**2)

扶醉桌前 提交于 2021-02-05 08:25:52
问题 Here is my code import numpy as np import time from scipy.spatial import distance y1=np.array([0,0,0,0,1,0,0,0,0,0]) y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ]) start_time = time.time() for i in range(1000000): distance.sqeuclidean(y1,y2) print("--- %s seconds ---" % (time.time() - start_time)) ---15.212640523910522 seconds--- start_time = time.time() for i in range(1000000): np.sum((y1-y2)**2) print("--- %s seconds ---" % (time.time() - start_time)) ---8.381187438964844--