multi-gpu

How to copy memory between different gpus in cuda

孤者浪人 提交于 2019-12-20 14:45:33
问题 Currently I'm work with two gtx 650 . My program resembles in simple Clients/Server structure. I distribute the work threads on the two gpus. The Server thread need to gather the result vectors from client threads, so I need to copy the memory between the two gpu. Unfortunaly, the simple P2P program in cuda samples just doesn't work because my cards don't have TCC drivers. Spending two hours searching on google and SO, I can't find the answer.Some source says I should use cudaMemcpyPeer , and

How to copy memory between different gpus in cuda

北战南征 提交于 2019-12-20 14:45:04
问题 Currently I'm work with two gtx 650 . My program resembles in simple Clients/Server structure. I distribute the work threads on the two gpus. The Server thread need to gather the result vectors from client threads, so I need to copy the memory between the two gpu. Unfortunaly, the simple P2P program in cuda samples just doesn't work because my cards don't have TCC drivers. Spending two hours searching on google and SO, I can't find the answer.Some source says I should use cudaMemcpyPeer , and

How to run Tensorflow Estimator on multiple GPUs with data parallelism

社会主义新天地 提交于 2019-12-20 08:59:57
问题 I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism? I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator. Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so? 回答1: I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator

Multi-GPU model ( LSTM with Stateful ) on Keras is not working

半腔热情 提交于 2019-12-13 17:34:10
问题 I am working on LSTM model with stateful using keras (Tensorflow backend); I cannot parallelize it on multi-GPU platform. here is link to code. I am getting following error. tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [256,75,39] vs. [512,75,39] [[Node: training/cna/gradients/loss/concatenate_1_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/mul"], _device="/job:localhost/replica:0/task:0/gpu

How to do multi GPU training with Keras?

只愿长相守 提交于 2019-12-13 11:58:20
问题 I want my model to run on multiple GPUs sharing parameters but with different batches of data. Can I do something like that with model.fit() ? Is there any other alternative? 回答1: Keras now has (as of v2.0.9) in-built support for device parallelism, across multiple GPUs, using keras.utils.multi_gpu_model . Currently, only supports the Tensorflow back-end. Good example here (docs): https://keras.io/getting-started/faq/#how-can-i-run-a-keras-model-on-multiple-gpus Also covered here: https:/

Data Parallelism for RNN in tensorflow

守給你的承諾、 提交于 2019-12-11 14:11:39
问题 Recently, I have used tensorflow to develop an NMT system. I tried to train this system on multi-gpus using data-parallelism method to speed up it. I follow the standard data-parallelism way widely used in tensorflow. For example, if we want to run it on a 8-gpus computer. First, we construct a large batch which contains 8 times the size of batch used in a single GPU. Then we split this large batch equally to 8 mini-batch. We separately train them in different gpus. In the end, we collect

Multiple GPUs in CUDA 3.2 and issues with Cuda 4.0

给你一囗甜甜゛ 提交于 2019-12-11 07:43:06
问题 I am new to multiple GPUs. I have written a code for a single GPU and want to further speed up by use of multiple GPUs. I am working with two GTX 470 with MS VS 2008 and cuda toolkit 4.0 I am facing two problems. First problem is my code somehow doesn't run fine with 4.0 build rules and works fine for 3.2 build rules. Also the SDK example of multiGPU doesn't build on VS2008 giving error error C3861: 'cudaDeviceReset': identifier not found My second problem is, if I have to work with 3.2 then

OpenCL multiple command queue for Concurrent NDKernal Launch

筅森魡賤 提交于 2019-12-11 03:13:12
问题 I m trying to run an application of vector addition, where i need to launch multiple kernels concurrently, so for concurrent kernel launch someone in my last question advised me to use multiple command queues. which i m defining by an array context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &err); for(i=0;i<num_ker;++i) { queue[i] = clCreateCommandQueue(context, device_id, 0, &err); } I m getting an error "command terminated by signal 11" some where around the above code. i m using

OpenGL multi-GPU support

南笙酒味 提交于 2019-12-09 18:01:27
问题 When we create the OpenGL context on PC, is there any way to choose which physical device or how many devices are used? Do the latest OpenGL (4.5) APIs support multi-GPU architecture? If I have two identical graphics cards (for example, two Nvidia GeForce cards), how do I properly program the OpenGL APIs in order to get benefits from the fact that I have two cards? How do I transfer the OpenGL program from a single GPU version to a multi-GPU version with minimal efforts? 回答1: OpenGL drivers