matrix-multiplication

Tensorflow, how to multiply a 2D tensor (matrix) by corresponding elements in a 1D vector

≯℡__Kan透↙ 提交于 2019-12-18 17:56:49
问题 I have a 2D matrix M of shape [batch x dim] , I have a vector V of shape [batch] . How can I multiply each of the columns in the matrix by the corresponding element in the V? That is: I know an inefficient numpy implementation would look like this: import numpy as np M = np.random.uniform(size=(4, 10)) V = np.random.randint(4) def tst(M, V): rows = [] for i in range(len(M)): col = [] for j in range(len(M[i])): col.append(M[i][j] * V[i]) rows.append(col) return np.array(rows) In tensorflow,

Parallel and distributed algorithms for matrix multiplication

假装没事ソ 提交于 2019-12-18 17:24:53
问题 The problem comes when I looked up Wikipedia page of Matrix multiplication algorithm It says: This algorithm has a critical path length of Θ((log n)^2) steps, meaning it takes that much time on an ideal machine with an infinite number of processors; therefore, it has a maximum possible speedup of Θ(n3/((log n)^2)) on any real computer. ( The quote is from section "Parallel and distributed algorithms/Shared-memory parallelism." ) Since assuming there are infinite processors, the operation of

Equivalent of cudaGetErrorString for cuBLAS?

大憨熊 提交于 2019-12-18 16:38:34
问题 CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) macro that many people use for CUDA error handling. I'm familiarizing myself with cuBLAS now, and I'd like to create a macro similar to CUDA_SAFE_CALL for cuBLAS. To make my macro's printouts useful, I'd like to have something analogous to cudaGetErrorString in cuBLAS. Is there an equivalent of

Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate]

梦想与她 提交于 2019-12-18 11:26:14
问题 This question already has answers here : Why does the order of the loops affect performance when iterating over a 2D array? (7 answers) Closed 5 years ago . I am given two functions for finding the product of two matrices: void MultiplyMatrices_1(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k]*b[k][j]; } void MultiplyMatrices_2(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for

Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate]

∥☆過路亽.° 提交于 2019-12-18 11:26:07
问题 This question already has answers here : Why does the order of the loops affect performance when iterating over a 2D array? (7 answers) Closed 5 years ago . I am given two functions for finding the product of two matrices: void MultiplyMatrices_1(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k]*b[k][j]; } void MultiplyMatrices_2(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for

How to write a matrix matrix product that can compete with Eigen?

佐手、 提交于 2019-12-18 10:57:07
问题 Below is the C++ implementation comparing the time taken by Eigen and For Loop to perform matrix-matrix products. The For loop has been optimised to minimise cache misses. The for loop is faster than Eigen initially but then eventually becomes slower (upto a factor of 2 for 500 by 500 matrices). What else should I do to compete with Eigen? Is blocking the reason for the better Eigen performance? If so, how should I go about adding blocking to the for loop? #include<iostream> #include<Eigen

numpy - matrix multiple 3x3 and 100x100x3 arrays?

回眸只為那壹抹淺笑 提交于 2019-12-18 09:28:59
问题 I have the following: import numpy as np XYZ_to_sRGB_mat_D50 = np.asarray([ [3.1338561, -1.6168667, -0.4906146], [-0.9787684, 1.9161415, 0.0334540], [0.0719453, -0.2289914, 1.4052427], ]) XYZ_1 = np.asarray([0.25, 0.4, 0.1]) XYZ_2 = np.random.rand(100,100,3) np.matmul(XYZ_to_sRGB_mat_D50, XYZ_1) # valid operation np.matmul(XYZ_to_sRGB_mat_D50, XYZ_2) # makes no sense mathematically How do I perform the same operation on XYZ_2 that I would on XYZ_2? Do I somehow reshape the array first? 回答1:

Efficient way of computing matrix product AXA'?

為{幸葍}努か 提交于 2019-12-18 07:52:54
问题 I'm currently using BLAS function DSYMM to compute Y = AX and then DGEMM for YA' , but I'm wondering is there some more efficient way of computing the matrix product AXA T , where A is an arbitrary n×n matrix and X is a symmetric n×n matrix? 来源: https://stackoverflow.com/questions/11139933/efficient-way-of-computing-matrix-product-axa

Speed up matrix multiplication by SSE (C++)

↘锁芯ラ 提交于 2019-12-17 23:42:48
问题 I need to run a matrix-vector multiplication 240000 times per second. The matrix is 5x5 and is always the same, whereas the vector changes at each iteration. The data type is float. I was thinking of using some SSE (or similar) instructions. 1) I am concerned that the number of arithmetic operations is too small compared to the number of memory operations involved. Do you think I can get some tangible (e.g. > 20%) improvement? 2) Do I need the Intel compiler to do it? 3) Can you point out

Why is this naive matrix multiplication faster than base R's?

浪尽此生 提交于 2019-12-17 23:33:48
问题 In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster. library(Rcpp) # Simple C++ code for matrix multiplication mm_code = "NumericVector my_mm(NumericMatrix m, NumericVector v){ int nRow = m.rows(); int nCol = m.cols(); NumericVector ans(nRow); double v_j; for(int j = 0; j < nCol; j++){ v_j = v[j]; for(int i = 0; i < nRow; i++){ ans[i] += m(i,j) * v_j