multi-gpu | 易学教程

How to copy memory between different gpus in cuda

阅读更多关于 How to copy memory between different gpus in cuda

问题 Currently I'm work with two gtx 650 . My program resembles in simple Clients/Server structure. I distribute the work threads on the two gpus. The Server thread need to gather the result vectors from client threads, so I need to copy the memory between the two gpu. Unfortunaly, the simple P2P program in cuda samples just doesn't work because my cards don't have TCC drivers. Spending two hours searching on google and SO, I can't find the answer.Some source says I should use cudaMemcpyPeer , and

How to copy memory between different gpus in cuda

阅读更多关于 How to copy memory between different gpus in cuda

How to run Tensorflow Estimator on multiple GPUs with data parallelism

阅读更多关于 How to run Tensorflow Estimator on multiple GPUs with data parallelism

问题 I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism? I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator. Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so? 回答1: I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator

Multi-GPU model ( LSTM with Stateful ) on Keras is not working

阅读更多关于 Multi-GPU model ( LSTM with Stateful ) on Keras is not working

问题 I am working on LSTM model with stateful using keras (Tensorflow backend); I cannot parallelize it on multi-GPU platform. here is link to code. I am getting following error. tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [256,75,39] vs. [512,75,39] [[Node: training/cna/gradients/loss/concatenate_1_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/mul"], _device="/job:localhost/replica:0/task:0/gpu

How to do multi GPU training with Keras?

阅读更多关于 How to do multi GPU training with Keras?

问题 I want my model to run on multiple GPUs sharing parameters but with different batches of data. Can I do something like that with model.fit() ? Is there any other alternative? 回答1: Keras now has (as of v2.0.9) in-built support for device parallelism, across multiple GPUs, using keras.utils.multi_gpu_model . Currently, only supports the Tensorflow back-end. Good example here (docs): https://keras.io/getting-started/faq/#how-can-i-run-a-keras-model-on-multiple-gpus Also covered here: https:/

Data Parallelism for RNN in tensorflow

阅读更多关于 Data Parallelism for RNN in tensorflow

问题 Recently, I have used tensorflow to develop an NMT system. I tried to train this system on multi-gpus using data-parallelism method to speed up it. I follow the standard data-parallelism way widely used in tensorflow. For example, if we want to run it on a 8-gpus computer. First, we construct a large batch which contains 8 times the size of batch used in a single GPU. Then we split this large batch equally to 8 mini-batch. We separately train them in different gpus. In the end, we collect

Caffe: Multi-GPU support with Matlab (Matcaffe)

阅读更多关于 Caffe: Multi-GPU support with Matlab (Matcaffe)

问题 Caffe is a deep learning framework that also includes a Matlab interface (Matcaffe). While Caffe supports multiple GPUs it seems that Matcaffe does not at the moment. Does anyone know of a workaround? 来源： https://stackoverflow.com/questions/33446612/caffe-multi-gpu-support-with-matlab-matcaffe

Multiple GPUs in CUDA 3.2 and issues with Cuda 4.0

阅读更多关于 Multiple GPUs in CUDA 3.2 and issues with Cuda 4.0

问题 I am new to multiple GPUs. I have written a code for a single GPU and want to further speed up by use of multiple GPUs. I am working with two GTX 470 with MS VS 2008 and cuda toolkit 4.0 I am facing two problems. First problem is my code somehow doesn't run fine with 4.0 build rules and works fine for 3.2 build rules. Also the SDK example of multiGPU doesn't build on VS2008 giving error error C3861: 'cudaDeviceReset': identifier not found My second problem is, if I have to work with 3.2 then

OpenCL multiple command queue for Concurrent NDKernal Launch

阅读更多关于 OpenCL multiple command queue for Concurrent NDKernal Launch

问题 I m trying to run an application of vector addition, where i need to launch multiple kernels concurrently, so for concurrent kernel launch someone in my last question advised me to use multiple command queues. which i m defining by an array context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &err); for(i=0;i<num_ker;++i) { queue[i] = clCreateCommandQueue(context, device_id, 0, &err); } I m getting an error "command terminated by signal 11" some where around the above code. i m using

OpenGL multi-GPU support

阅读更多关于 OpenGL multi-GPU support

问题 When we create the OpenGL context on PC, is there any way to choose which physical device or how many devices are used? Do the latest OpenGL (4.5) APIs support multi-GPU architecture? If I have two identical graphics cards (for example, two Nvidia GeForce cards), how do I properly program the OpenGL APIs in order to get benefits from the fact that I have two cards? How do I transfer the OpenGL program from a single GPU version to a multi-GPU version with minimal efforts? 回答1: OpenGL drivers