gpu

Differences between cl_khr_fp64 and cl_amd_fp64?

微笑、不失礼 提交于 2019-12-22 00:02:47
问题 I just found that on my (pretty expensive) Radeon 6970, only cl_amd_fp64 extension is supported. I am getting odd results in some parts of the code (accessing the value of 0.005 actually uses 1.99916e+37 ?) when running with cl_amd_fp64 . Using cl_khr_fp64 with Intel SDK on the CPU works just fine. (The input buffers are exactly the same) The extension page gives very little information. What are exactly the differences between both? 回答1: cl_khr_fp64 is the Khronos official double precision

How to configure OpenCL in visual studio2010 for nvidia's gpu on windows?

折月煮酒 提交于 2019-12-21 20:44:59
问题 I am using NVIDIA's GeForce GTX 480 GPU on Wwindows 7 operating system on my ASUS laptop. I have already configured Visual Studio 2010 for CUDA 4.2. How to configure OpenCL for nvidia's gpu on visual studio 2010?? Have tries every possible way. Is it possible by any way to use 'CUDA toolkit (CUDA 4.2)' and 'nvidia's gpu computing sdk' to program OpenCL? If yes then How? If no then what is other way? 回答1: Yes. You should be able to use Visual Studio 2010 to program for OpenCL. It should simply

Precision when reading image with CLK_FILTER_LINEAR in OpenCL

守給你的承諾、 提交于 2019-12-21 20:09:58
问题 The code I used is from this question OpenCL image3d linear sampling , I've tested in 2d and 3d, both with huge diff between CPU and GPU. Here is the result of CPU: coordinate:0.000000, result: 0.000000 coordinate:0.100000, result: 0.000000 coordinate:0.200000, result: 0.000000 coordinate:0.300000, result: 10.156250 coordinate:0.400000, result: 30.078125 coordinate:0.500000, result: 50.000000 coordinate:0.600000, result: 69.921875 coordinate:0.700000, result: 89.843750 coordinate:0.800000,

Is it possible to share a Cuda context between applications?

99封情书 提交于 2019-12-21 17:49:02
问题 I'd like to pass a Cuda context between two independent Linux processes (using POSIX message queues, which I already have set up). Using cuCtxPopCurrent() and cuCtxPushCurrent() , I can get the context pointer, but this pointer is referenced in the memory of the process in which I call the function, and passing it between processes is meaningless. I'm looking for other solutions. My ideas so far are: Try to deep copy the CUcontext struct, and then pass the copy. See if I can find a shared

CUB (CUDA UnBound) equivalent of thrust::gather

别等时光非礼了梦想. 提交于 2019-12-21 17:40:07
问题 Due to some performance issues with the Thrust libraries (see this page for more details), I am planning on re-factoring a CUDA application to use CUB instead of Thrust. Specifically, to replace the thrust::sort_by_key and thrust::inclusive_scan calls). In a particular point in my application I need to sort 3 arrays by key. This is how I did this with thrust: thrust::sort_by_key(key_iter, key_iter + numKeys, indices); thrust::gather_wrapper(indices, indices + numKeys, thrust::make_zip

How to check for GPU on CentOS Linux

陌路散爱 提交于 2019-12-21 07:55:16
问题 It is suggested that on Linux, GPU be found with the command lspci | grep VGA . It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not the administrator of the machine and I only use it remotely from command line. I intend to use the GPU as a GPGPU on that machine, but first I need to check if it even has one. 回答1: Have you tried to launch /sbin/lspci or /usr/sbin/lspci ? 回答2: This

Is Intel based graphic card compatible with tensorflow/GPU?

十年热恋 提交于 2019-12-21 07:08:36
问题 Is this graphic card compatible with tensorflow/GPU ? *-display description: VGA compatible controller product: Haswell-ULT Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 09 width: 64 bits clock: 33MHz capabilities: msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:44 memory:c2000000-c23fffff memory:b0000000-bfffffff ioport:7000(size=64) 回答1: At the moment no. Only Nvidia GPUs and

Hardware accelerate bitmap drawing in java

时光毁灭记忆、已成空白 提交于 2019-12-21 06:57:14
问题 I want to be able to draw consecutive bitmaps (of type BufferedImage.TYPE_INT_RGB) of a video as quickly as possible in java. I want to know the best method in doing so. Does anyone have any advice where I should start? From what I've read, 2 options are: 1) Use GDI/GDI+ routines in a JNI dll working with JAWT (Im on Windows) 2) Use Java3D and apply Textures to a Box's face and rotate it to the camera Im interesting in any advice on these topics as well as any others. I have done a decent

Low GPU Usage & Performance with Tensorflow + RNNs

孤者浪人 提交于 2019-12-21 03:01:25
问题 I have implemented a network that tries to predict a word from a sentence. The network is actually pretty complex, but here’s a simple version of it: Take indices of words in a sentences and convert to embeddings Run each sentence through LSTM Give each word in the sentence a score with a linear multiplication of the LSTM output And here’s the code: # 40 samples with random size up to 500, vocabulary size is 10000 with 50 dimensions def inference(inputs): inputs = tf.constant(inputs) word

Upload data in shared memory for convolution kernel

纵然是瞬间 提交于 2019-12-20 23:31:42
问题 I am having some difficulties to understand the batch loading as in the comments is referred. In order to compute the convolution in a pixel the mask whose size is 5 must become centered on this specific pixel. The image is divided into tiles. These tiles after applying the convolution mask are the final output tiles whose size is TILE_WIDTH*TILE_WIDTH . For the pixels that belong to the border of the output tile the mask must borrow some pixels from the neighbor tile, when this tile belong