gpu | 易学教程

Keras sees my GPU but doesn't use it when training a neural network

阅读更多关于 Keras sees my GPU but doesn't use it when training a neural network

问题 My GPU is not used by Keras/TensorFlow. To try to make my GPU working with tensorflow, I installed tensorflow-gpu via pip (I am using Anaconda on Windows) I have nvidia 1080ti print(tf.test.is_gpu_available()) True print(tf.config.experimental.list_physical_devices()) [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] I tied physical_devices = tf.config.experimental.list_physical_devices('GPU') tf.config

Run through large generator iterable on GPU

阅读更多关于 Run through large generator iterable on GPU

问题 I recently received help with optimizing my code to use generators to save on memory while running code that needs to check many permutations. To put it in perspective, I believe the generator is iterating over a list that has 2! * 2! * 4! * 2! * 2! * 8! * 4! * 10! elements in it. Unfortunately, while I now no longer run out of memory generating the permutations, it is taking >24 hours to run my code. Is it possible to parallelize this through GPU? Generating the iterator with all the above

Run through large generator iterable on GPU

阅读更多关于 Run through large generator iterable on GPU

Memory allocation Nvidia vs AMD

阅读更多关于 Memory allocation Nvidia vs AMD

问题 I know there is a 128MB limit for a single block of GPU memory on AMD GPU's. Is there a similar limit on Nvidia GPU's? 回答1: On GTX 560 clGetDeviceInfo returns 256MiB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, however I can allocate slightly less than 1GiB. See this thread discussing the issue. On AMD however this limit is enforced. You can raise it by changing GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables (see this thread). 回答2: You can query this information at runtime using

Best GPU algorithm for calculating lists of neighbours

阅读更多关于 Best GPU algorithm for calculating lists of neighbours

问题 Given a collection of thousands of points in 3D, I need to get the list of neighbours for each particle that fall inside some cutoff value (in terms of euclidean distance), and if possible, sorted from nearest fo farthest. Which is the fastest GPU algorithm for this purpose in the CUDA or OpenCL languages? 回答1: One of the fastest GPU MD codes I'm aware of, HALMD, uses a (highly tuned) version of the same sort of approach that is used in the CUDA SDK examples, "Particles". Both the HALMD paper

Keras does not use GPU - how to troubleshoot?

阅读更多关于 Keras does not use GPU - how to troubleshoot?

问题 I'm trying to train a Keras model on the GPU, with Tensorflow as backend. I have set everything up according to https://www.tensorflow.org/install/install_windows. This is my setup: I'm working in a Jupyter notebook in a virtualenv environment. The current virtualenv environment has tensorflow-gpu installed. I have CUDA 9.1 and cudaDNN for CUDA 9.1 installed. cuDNN64_7.dll is at a location which is accessible via the PATH variable. I have an NVIDIA GeForce GTX 780 on my computer with the

Custom Kernel GpuMat with float

阅读更多关于 Custom Kernel GpuMat with float

问题 I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars cannot be used to calculate arc cosines. However, when I try to convert my GPU to CV_32FC1 type (floats) I get an illegal memory access error during the download part. Here is my code: //.cu code #include <cuda_runtime.h> #include <stdlib.h> #include <iostream> #include <stdio.h> __global__ void

matlab: using the GPU for saving an image off a figure

阅读更多关于 matlab: using the GPU for saving an image off a figure

问题 I use matlab to render a complex mesh (using trimesh, material, camlight, view...) and need not display it to the user, just to get the rendered image. This is discussed in another question. Using any of the suggested solutions (save-as image, saving into a video object, and using undocumented hardcopy ) is very slow (~1sec), especially compared to rendering the plot itself, including painting on the screen takes less than 0.5sec. I believe it is caused by hardcopy method not to utilize the

Renderscript and the GPU

阅读更多关于 Renderscript and the GPU

问题 I know that Renderscript's design is to obscure the fact about what processor I'm running on, but is there any way to write the code such that on GPU-compute-capable devices (at the moment, Nexus 10), it will run on the GPU? Is there any way to tell that a script's function is running on the GPU? www.leapconf.com/downloads/LihuaZhang-MulticoreWare.pdf suggests that if I don't use globals, I don't use recursion, and don't call rsDebug anywhere in a kernel, it will be run on the GPU; is that

TensorFlow GPU: is cudnn optional? Couldn't open CUDA library libcudnn.so

阅读更多关于 TensorFlow GPU: is cudnn optional? Couldn't open CUDA library libcudnn.so

问题 I installed the tensorflow-0.8.0 GPU version, tensorflow-0.8.0-cp27-none-linux_x86_64.whl. It says it requires CUDA toolkit 7.5 and CuDNN v4. # Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4. For # other versions, see "Install from sources" below. However, I accidently forget to install CuDNN v4, but it works OK besides the error message, "Couldn't open CUDA library libcudnn.so". But it works and says, "Creating TensorFlow device (/gpu:0)". msg without CuDNN I