gpu

How can I write the memory pointer in CUDA [duplicate]

我的未来我决定 提交于 2019-12-13 07:41:43
问题 This question already has an answer here : Summing the rows of a matrix (stored in either row-major or column-major order) in CUDA (1 answer) Closed 2 years ago . I declared two GPU memory pointers, and allocated the GPU memory, transfer data and launch the kernel in the main: // declare GPU memory pointers char * gpuIn; char * gpuOut; // allocate GPU memory cudaMalloc(&gpuIn, ARRAY_BYTES); cudaMalloc(&gpuOut, ARRAY_BYTES); // transfer the array to the GPU cudaMemcpy(gpuIn, currIn, ARRAY

Why we do not have access to device memory on host side?

血红的双手。 提交于 2019-12-13 06:21:41
问题 I asked a question Memory allocated using cudaMalloc() is accessable by host or not? though the things are much clear to me now, but I am still wondering why it is not possible to access the device pointer in host. My understanding is that the CUDA driver takes care of memory allocation inside GPU DRAM. So this information (that what is my first address of allocated memory in device), can be conveyed to the OS running on host. Then it can be possible to access this device pointer i.e the

OpenCl: Minimal configuration to work with AMD GPU

 ̄綄美尐妖づ 提交于 2019-12-13 06:12:41
问题 Suppose we have AMD GPU (for example Radeon HD 7970) and minimal linux system without X and etc. What should be installed and what should be launched and how it should be launched to have proper OpenCL environment? In best case it should be headless environment. Requirements to environment: GPU visible by OpenCL programs ( clinfo for example) It is possible to monitor temperature and set fan speed (for example using aticonfig ). P.S. Simple install Xserver, catalyst and run X :0 won't work

Why cv::gpu::GaussianBlur is slower, than cv::GaussianBlur?

喜你入骨 提交于 2019-12-13 05:27:20
问题 I'm not a pro in C++, OpenCV and CUDA, and don't understand why cv::gpu::warpPerspective(g_mask, g_frame, warp_matrix, g_frame.size(), cv::INTER_LINEAR, cv::BORDER_CONSTANT, cv::Scalar(255,255,255)); cv::gpu::GaussianBlur(g_frame, g_frame, cv::Size(blur_radius, blur_radius), 0); g_frame.download(mask); is slower, than cv::gpu::warpPerspective(g_mask, g_frame, warp_matrix, g_frame.size(), cv::INTER_LINEAR, cv::BORDER_CONSTANT, cv::Scalar(255,255,255)); g_frame.download(mask); cv::GaussianBlur

OpenGL: How to render 2 simple VAOs on Intel HD Graphics 4000 GPU?

做~自己de王妃 提交于 2019-12-13 04:54:24
问题 Summary: My original question and observations are followed by an updated working OpenGL code for Intel HD Graphics 4000 GPU. Original question: Two cubes are shown on Nvidia NVS 4200m GPU and 1 cube shown on Intel HD Graphics 4000 GPU. Using OpenGL 3.2 forward profile and OpenTK to render 2 simple cubes on screen It shows the first cube centered at (0,0,0) on Intel HD Graphics 4000 with the latest GPU driver 7/2/2014 ver 10.18.0010.3621. It should show 2 cubes. We're using a Vertex Array

GPU vs CPU render mode Adobe AIR

夙愿已清 提交于 2019-12-13 04:34:33
问题 I had the following question: BitmapData lock and unlock not working on android Now, encountering that issue, reading about render mode, I'm very confused how a script that simple fails in GPU mode,but is very fast on CPU mode. So the question is, how GPU mode works and how CPU mode works for adobe air? And why on GPU most of the stuff works better, but not that script Note: Base bitmap size should be bigger than 1400x1400 回答1: There are some limitations on GPU render mode. Adobe recommends

Why is my pcl cuda code running in CPU instead of GPU?

人走茶凉 提交于 2019-12-13 03:44:01
问题 I have a code where I use the pcl/gpu namespace: pcl::gpu::Octree::PointCloud clusterCloud; clusterCloud.upload(cloud_filtered->points); pcl::gpu::Octree::Ptr octree_device (new pcl::gpu::Octree); octree_device->setCloud(clusterCloud); octree_device->build(); /*tree->setCloud (clusterCloud);*/ // Create the cluster extractor object for the planar model and set all the parameters std::vector<pcl::PointIndices> cluster_indices; pcl::gpu::EuclideanClusterExtraction ec; ec.setClusterTolerance (0

Keras multi_gpu_model causes system to crash

主宰稳场 提交于 2019-12-13 03:33:32
问题 I am trying to train a rather large LSTM on a large dataset and have 4 GPUs to distribute the load. If I try to train on just one of them (any of them, I've tried each) it functions correctly, but after adding the multi_gpu_model code it crashes my entire system when I try to run it. Here is my multi-gpu code batch_size = 8 model = Sequential() model.add(Masking(mask_value=0., input_shape=(len(inputData[0]), len(inputData[0][0])) )) model.add(LSTM(256, return_sequences=True)) model.add

“Peer access” failed when using pycuda and tensorflow together

て烟熏妆下的殇ゞ 提交于 2019-12-13 03:01:48
问题 I have some codes in python3 like this: import numpy as np import pycuda.driver as cuda from pycuda.compiler import SourceModule, compile import tensorflow as tf # create device and context cudadevice=cuda.Device(gpuid1) cudacontext=cudadevice.make_context() config = tf.ConfigProto() config.gpu_options.visible_device_list={}.format(gpuid2) sess = tf.Session(config=config) # compile from a .cu file cuda_mod = SourceModule(cudaCode, include_dirs = [dir_path], no_extern_c = True, options = ['-O0

location of cudaEventRecord and overlapping ops from different streams

有些话、适合烂在心里 提交于 2019-12-13 02:45:10
问题 I have two tasks. Each of them perform copy to device (D), run kernel (R), and copy to host (H) operations. I am overlapping copy to device of task2 (D2) with run kernel of task1 (R1). In addition, I am overlapping run kernel of task2 (R2) with copy to host of task1 (H1). I also record start and stop time of D, R, H ops of each task using cudaEventRecord. I have GeForce GT 555M, CUDA 4.1, and Fedora 16. I have three scenarios: Scenario1: I use one stream for each task. I place start/stop