matrix-multiplication

Can I stably invert a Vandermonde matrix with many small values in R?

て烟熏妆下的殇ゞ 提交于 2019-12-02 13:25:56
updated on this question: I have closed this question and I will post a new question focus on R package Rmpfr. To conclude this question and to help others, I will post my codes of the inverse of a Vandermonde Matrix from its explicit inverse formula. The generation terms are the x's in [here] 1 . I am not a skilled programmer. Therefore I don't expect my codes to be the most efficient one. I post the codes here because it is better than nothing. library(gtools) #input is the generation vector of terms of Vandermonde matrix. FMinv <- function(base){ n=length(base) inv=matrix(nrow=n,ncol=n) for

Not sure how to explain some of the performance results of my parallelized matrix multiplication code

自古美人都是妖i 提交于 2019-12-02 09:56:12
问题 I'm running this code in OpenMP for matrix multiplication and I measured its results: #pragma omp for schedule(static) for (int j = 0; j < COLUMNS; j++) for (int k = 0; k < COLUMNS; k++) for (int i = 0; i < ROWS; i++) matrix_r[i][j] += matrix_a[i][k] * matrix_b[k][j]; There are different versions of the code based on where i put the #pragma omp directive - before the j loop, k loop, or the i loop. Also, for every one of those variants I ran different versions for default static scheduling,

Unable to execute device kernel in CUDA

十年热恋 提交于 2019-12-02 09:53:06
I am trying to call a device kernel within a global kernel. My global kernel is a Matrix Multiplication and my device kernel is finding the maximum value and the index in each column of the product matrix. Following is the code : __device__ void MaxFunction(float* Pd, float* max) { int x = (threadIdx.x + blockIdx.x * blockDim.x); int y = (threadIdx.y + blockIdx.y * blockDim.y); int k = 0; int temp = 0; int temp_idx = 0; for (k = 0; k < wB; ++k) { if(Pd[x*wB + y] > temp){ temp = Pd[x*wB + y]; temp_idx = x*wB + y; } max[y*2 + 0] = temp; max[y*2 + 1] = temp_idx; } } __global__ void

MPI Matrix Multiplication with scatter gather

淺唱寂寞╮ 提交于 2019-12-02 07:39:15
I'm trying to do matrix multiplication using MPI in C and we have to do a version that sequential and one parallel version. My parallel version is not giving the correct answers and I'm not sure why. I think I'm not sending the right communications to the processes but I can't be sure. The professor just went over the different send/receive/gather etc messages, but didn't really get into much detail... I've seen a lot of different examples but none complete and none using scatter/gather. If anyone can take a look at my code and tell me if anything pops out at them I'd appreciate it. I'm pretty

Perform matrix multiplication between two arrays and get result only on masked places

五迷三道 提交于 2019-12-02 07:06:32
问题 I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C . I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C . Is there a way to use W as a "mask" so that the resulting

Parallel Matrix Multiplication in MATLAB

a 夏天 提交于 2019-12-02 06:30:59
问题 Is there a relatively easy to implement or transparent way to multiply two large matrices in Matlab in parallel? Ideally, I would like to perform this parallel multiplication with at most a few lines of code, perhaps something like: C_1 = A*B % normal C_2 = pmult(A,B) % parallel % C_1 and C_2 have the same entries If there is a way to easily do this paralell multiplication, can someone please point me to the code? If not, does anyone have any ideas on what they feel is the best way to

Multiplying 3D matrix with 2D matrix

ε祈祈猫儿з 提交于 2019-12-02 06:04:41
I have two matrices to multiply. One is weight matrix - W whose size is 900x2x2 . Another is input matrix-I whose size is 2x2 . Now I want to perform summation over c = WI which will be 900x1 matrix, but when I perform the operation it multiplies and gives me 900x2x2 matrix again. Q 2) (related) So I made both of them 2D and multiplied 900x4 * 4x1 but that gives me an error saying ValueError:operands could not be broadcast together with shapes (900,4) (4,1) It seems you are trying to lose the last two axes of the first array against the only two axes of the second weight array with that matrix

Opencv Matrix multiplication

岁酱吖の 提交于 2019-12-02 04:44:32
问题 i need to multiply a matrix and its transpose but i get the following error : "OpenCV Error: Assertion failed (type == B.type() && (type == CV_32FC1 || type == CV_64FC1 || type == CV_32FC2 || type == CV_64FC2)) in unknown function, file .. ....\src\opencv\modules\core\src\matmul.cpp, line 711" here is the code: int dA[] = { 1, 2, 3, 4, 5, 6, 6, 5, 4, }; Mat A = Mat(3,3, CV_32S, dA ); Mat C = A.t()* A; 回答1: OpenCV only supports matrix multiplication for matrices of floating point real or

Not sure how to explain some of the performance results of my parallelized matrix multiplication code

≡放荡痞女 提交于 2019-12-02 04:26:37
I'm running this code in OpenMP for matrix multiplication and I measured its results: #pragma omp for schedule(static) for (int j = 0; j < COLUMNS; j++) for (int k = 0; k < COLUMNS; k++) for (int i = 0; i < ROWS; i++) matrix_r[i][j] += matrix_a[i][k] * matrix_b[k][j]; There are different versions of the code based on where i put the #pragma omp directive - before the j loop, k loop, or the i loop. Also, for every one of those variants I ran different versions for default static scheduling, static scheduling with chunks 1 and 10 and dynamic scheduling with the same chunks. I also measured the

Non Square Matrix Multiplication in CUDA

。_饼干妹妹 提交于 2019-12-02 02:26:26
The code I use for matrix multiplications in CUDA lets me multiply both square and non square matrices, however, both Width and Height MUST be multiples of blocksize. So, for example, I can multiply [3][6] * [6][3] (using blocksize=3), but I can't multiply [3][2]*[2][3]. Does anyone knows a way to do that? This is my kernel: #include <stdio.h> #include <limits.h> #include <stdlib.h> #define blocksize 3 #define HM (1*blocksize) #define WM (2*blocksize) #define WN (1*blocksize) #define HN WM #define WP WN #define HP HM #define PTH WM #define PTW HM __global__ void nonsquare(float*M, float*N,