Writing a fast linear system solver in OpenCL C
问题 I'm writing an OpenCL kernel which will involve solving a linear system. Currently my kernel is simply too slow, and improving the performance of the linear system portion seemed like a good place to start. I should also note that I'm not trying make my linear solver parallel, the problem I'm working on is already embarassingly parallel at a macroscopic level. The following is C code I wrote for solving Ax=b using Gaussian elimination with partial pivoting, #import <stdio.h> #import <math.h>