gpu-programming | 易学教程

Linking with 3rd party CUDA libraries slows down cudaMalloc

阅读更多关于 Linking with 3rd party CUDA libraries slows down cudaMalloc

It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the running time of cudaMalloc directly depends on how many 3rd-party CUDA libraries I linked to my program (note that I do NOT use these libraries, just link my program with them) I ran some tests using the following program: int main() { cudaSetDevice(0); unsigned int *ptr = 0; cudaMalloc((void **)&ptr, 2000000 * sizeof(unsigned int)); cudaFree(ptr); return 1; } the results are as follows: Linked

OpenCL code 'Error MSB3721' for Intel OpenCL SDK on Visual Studio 2010

阅读更多关于 OpenCL code 'Error MSB3721' for Intel OpenCL SDK on Visual Studio 2010

问题 I am currently using Intel's OpenCL SDK platform for heterogeneous parallel programming (OpenCL). I am using Visual Studio 2010 Ultimate for this. My system doesn't have any GPU in it. I have worked on CUDA SDK platform for opencl programming. This the first time I am using Intel's OpenCL SDK for opencl programming. I have tried some basic platform, device, context identifying/creating/defining codes from 'OpenCL in Action' book. They all worked fine. So we can consider that visual studio is

CUDA GPU selected by position, but how to set default to be something other than device 0?

阅读更多关于 CUDA GPU selected by position, but how to set default to be something other than device 0?

问题 I've recently installed a second GPU (Tesla K40) on my machine at home and my searches have suggested that the first PCI slot becomes the default GPU chosen for CUDA jobs. A great link is explaining it can be found here: Default GPU Assignment My original GPU is a TITAN X, also CUDA enabled, but it's really best for single precision calculations and the Tesla better for double precision. My question for the group is whether there is a way to set up my default CUDA programming device to be the

cuda 5.0 dynamic parallelism error: ptxas fatal . unresolved extern function 'cudaLaunchDevice

阅读更多关于 cuda 5.0 dynamic parallelism error: ptxas fatal . unresolved extern function 'cudaLaunchDevice

I am using tesla k20 with compute capability 35 on Linux with CUDA 5.With a simple child kernel call it gives a compile error : Unresolved extern function cudaLaunchDevice My command line looks like: nvcc --compile -G -O0 -g -gencode arch=compute_35 , code=sm_35 -x cu -o fill.cu fill.o I see cudadevrt.a in lib64.. Do we need to add it or what coukd be done to resolve it? Without child kernel call everything works fine. You must explicitly compile with relocatable device code enabled and link the device runtime library in order to use dynamic parallelism. So your compilation command must

How do you include standard CUDA libraries to link with NVRTC code?

阅读更多关于 How do you include standard CUDA libraries to link with NVRTC code?

问题 Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram ), I have to send in the name of the file ( curand_kernel.h ) and also the source code of curand_kernel.h ? I feel like I shouldn't have to do that. It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I

Can I run CUDA on Intel's integrated graphics processor?

阅读更多关于 Can I run CUDA on Intel's integrated graphics processor?

I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn some cuda programming. But, I am not sure, if I can do that on my laptop as it does not have any nvidia's cuda enabled GPU. In fact, I doubt, if I even have a GPU o_o So, I would appreciate if someone can tell me if I can do CUDA programming with the current configuration and if possible also let me know what does Intel(HD) Graphics mean? At the present time, Intel graphics chips do not support CUDA.

nvidia-smi Volatile GPU-Utilization explanation?

阅读更多关于 nvidia-smi Volatile GPU-Utilization explanation?

I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else? +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M

CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

阅读更多关于 CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

问题 Let's say I have two device_vector<byte> arrays, d_keys and d_data . If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a 1D array of size 5 ( e.g. { 1, 0, 0, 1, 1 } ), how can I do a reduction such that I'd end up only adding values on a per-row basis if the corresponding d_keys value is one ( e.g. ending up with a result of { 10, 23, 14 } )? The sum_rows.cu example allows me to add every value in d_data , but that's not

Linking with 3rd party CUDA libraries slows down cudaMalloc

阅读更多关于 Linking with 3rd party CUDA libraries slows down cudaMalloc

问题 It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the running time of cudaMalloc directly depends on how many 3rd-party CUDA libraries I linked to my program (note that I do NOT use these libraries, just link my program with them) I ran some tests using the following program: int main() { cudaSetDevice(0); unsigned int *ptr = 0; cudaMalloc((void **)

Can I run CUDA on Intel's integrated graphics processor?

阅读更多关于 Can I run CUDA on Intel's integrated graphics processor?

问题 I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn some cuda programming. But, I am not sure, if I can do that on my laptop as it does not have any nvidia's cuda enabled GPU. In fact, I doubt, if I even have a GPU o_o So, I would appreciate if someone can tell me if I can do CUDA programming with the current configuration and if possible also let me