cusolver

undefined reference to cusolverDn

北城余情 提交于 2021-02-19 08:15:06
问题 I am trying to run the cuSolver library available in cuda 7.0. I have an issue with using the cuSolver library that must be very simple to fix, but here I am asking for some help. I have looked at quite a few examples posted around and I chose in particular this one from JackOLantern: Parallel implementation for multiple SVDs using CUDA I have just reduced it to a kernel_0.cu: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include<iostream> #include<iomanip> #include<stdlib

CUDA - CUBLAS: issues solving many (3x3) dense linear systems

元气小坏坏 提交于 2021-02-10 15:45:08
问题 I am trying to solve about 1200000 linear systems (3x3, Ax=B) with CUDA 10.1, in particular using the CUBLAS library. I took a cue from this (useful!) post and re-wrote the suggested code in a Unified Memory version. The algorithm firstly performs a LU factorization using cublasgetrfBatched() followed by two consecutive invocations of cublastrsm() which solves upper or lower triangular linear systems. The code is attached below. It works correctly up to about 10000 matrixes and, in this case,

CUDA - CUBLAS: issues solving many (3x3) dense linear systems

[亡魂溺海] 提交于 2021-02-10 15:44:28
问题 I am trying to solve about 1200000 linear systems (3x3, Ax=B) with CUDA 10.1, in particular using the CUBLAS library. I took a cue from this (useful!) post and re-wrote the suggested code in a Unified Memory version. The algorithm firstly performs a LU factorization using cublasgetrfBatched() followed by two consecutive invocations of cublastrsm() which solves upper or lower triangular linear systems. The code is attached below. It works correctly up to about 10000 matrixes and, in this case,

Best way of solving sparse linear systems in C++ - GPU Possible?

谁都会走 提交于 2020-01-02 06:24:06
问题 I am currently working on a project where we need to solve |Ax - b|^2 . In this case, A is a very sparse matrix and A'A has at most 5 nonzero elements in each row. We are working with images and the dimension of A'A is NxN where N is the number of pixels. In this case N = 76800 . We plan to go to RGB and then the dimension will be 3Nx3N . In matlab solving (A'A)\(A'b) takes about 0.15 s, using doubles. I have now done some experimenting with Eigens sparse solvers. I have tried: SimplicialLLT

Cholesky decomposition with CUDA

让人想犯罪 __ 提交于 2019-12-23 19:32:16
问题 I am trying to implement Cholesky decomposition using the cuSOLVER library. I am a beginner CUDA programmer and I have always specified block-sizes and grid-sizes, but I am not able to find out how this can be set explicitly by the programmer with cuSOLVER functions. Here is the documentation: http://docs.nvidia.com/cuda/cusolver/index.html#introduction The QR decomposition is implemented using the cuSOLVER library (see the example here: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr

Singular values calculation only with CUDA

放肆的年华 提交于 2019-12-10 21:06:39
问题 I'm trying to use the new cusolverDnSgesvd routine of CUDA 7.0 for the calculation of the singular values. The full code is reported below: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> #include<iostream> #include<stdlib.h> #include<stdio.h> #include <cusolverDn.h> #include <cuda_runtime_api.h> /***********************/ /* CUDA ERROR CHECKING */ /***********************/ void gpuAssert(cudaError_t code, char *file, int line, bool abort=true) { if (code !=

Best way of solving sparse linear systems in C++ - GPU Possible?

╄→гoц情女王★ 提交于 2019-12-05 15:06:11
I am currently working on a project where we need to solve |Ax - b|^2 . In this case, A is a very sparse matrix and A'A has at most 5 nonzero elements in each row. We are working with images and the dimension of A'A is NxN where N is the number of pixels. In this case N = 76800 . We plan to go to RGB and then the dimension will be 3Nx3N . In matlab solving (A'A)\(A'b) takes about 0.15 s, using doubles. I have now done some experimenting with Eigens sparse solvers. I have tried: SimplicialLLT SimplicialLDLT SparseQR ConjugateGradient and some different orderings. The by far best so far is

getrs function of cuSolver over pycuda doesn't work properly

只愿长相守 提交于 2019-12-01 14:43:11
I'm trying to make a pycuda wrapper inspired by scikits-cuda library for some operations provided in the new cuSolver library of Nvidia. I want to solve a linear system of the form AX=B by LU factorization, to perform that first use the cublasSgetrfBatched method from scikits-cuda, that give me the factorization LU; then with that factorization I want to solve the system using cusolverDnSgetrs from cuSolve that I want to wrap, when I perform the computation return status 3, the matrices that supose to give me the answer don't change, BUT the *devInfo is zero, looking in the cusolver's

getrs function of cuSolver over pycuda doesn't work properly

懵懂的女人 提交于 2019-12-01 12:47:16
问题 I'm trying to make a pycuda wrapper inspired by scikits-cuda library for some operations provided in the new cuSolver library of Nvidia. I want to solve a linear system of the form AX=B by LU factorization, to perform that first use the cublasSgetrfBatched method from scikits-cuda, that give me the factorization LU; then with that factorization I want to solve the system using cusolverDnSgetrs from cuSolve that I want to wrap, when I perform the computation return status 3, the matrices that