multi-gpu

tensorflow multi GPU parallel usage

↘锁芯ラ 提交于 2021-02-05 04:55:53
问题 I want to use 8 gpus on parallel, not sequencely. For example, when I execute this code, import tensorflow as tf with tf.device('/gpu:0'): for i in range(10): print(i) with tf.device('/gpu:1'): for i in range(10, 20): print(i) I tried cmd command 'CUDA_VISIBLE_DEVICE='0,1' but result is same. I want to see the result "0 10 1 11 2 3 12 .... etc" But actual result is sequencely "0 1 2 3 4 5 ..... 10 11 12 13.." How can I get wanted result? 回答1: ** I see an edit with the question so adding this

tensorflow multi GPU parallel usage

做~自己de王妃 提交于 2021-02-05 04:53:32
问题 I want to use 8 gpus on parallel, not sequencely. For example, when I execute this code, import tensorflow as tf with tf.device('/gpu:0'): for i in range(10): print(i) with tf.device('/gpu:1'): for i in range(10, 20): print(i) I tried cmd command 'CUDA_VISIBLE_DEVICE='0,1' but result is same. I want to see the result "0 10 1 11 2 3 12 .... etc" But actual result is sequencely "0 1 2 3 4 5 ..... 10 11 12 13.." How can I get wanted result? 回答1: ** I see an edit with the question so adding this

Efficient allreduce is not supported for 2 IndexedSlices

佐手、 提交于 2021-01-26 04:13:55
问题 I am trying to run a Subclassed Keras Model on multiple GPUs. The code is running as expected, however, the following "warning" crops up during the execution of the code: "Efficient allreduce is not supported for 2 IndexedSlices" What does this mean? I followed the Multi-GPU tutorial on Tensorflow 2.0 Beta guide. I am also using the Dataset API for my input pipeline. 来源: https://stackoverflow.com/questions/56843876/efficient-allreduce-is-not-supported-for-2-indexedslices

Efficient allreduce is not supported for 2 IndexedSlices

一笑奈何 提交于 2021-01-26 04:13:53
问题 I am trying to run a Subclassed Keras Model on multiple GPUs. The code is running as expected, however, the following "warning" crops up during the execution of the code: "Efficient allreduce is not supported for 2 IndexedSlices" What does this mean? I followed the Multi-GPU tutorial on Tensorflow 2.0 Beta guide. I am also using the Dataset API for my input pipeline. 来源: https://stackoverflow.com/questions/56843876/efficient-allreduce-is-not-supported-for-2-indexedslices

Using peered GPUs in a single stream

扶醉桌前 提交于 2020-02-02 13:17:50
问题 In my current project I use GPUs for signal processing and visualization. I'm already using streams to allow for asynchronous operation. The signal is processed in frames and for each frame the processing steps in a stream are as following memcpy to device signal conditioning image processing visualization Right now the steps are happening on a single GPU, however my machine has a Multi-GPU card (GeForce GTX 690) and I'd like to distribute the operation between the two devices. Basically I'd

CUDA: Memory copy to GPU 1 is slower in multi-GPU

删除回忆录丶 提交于 2020-01-23 04:02:09
问题 My company has a setup of two GTX 295, so a total of 4 GPUs in a server, and we have several servers. We GPU 1 specifically was slow, in comparison to GPU 0, 2 and 3 so I wrote a little speed test to help find the cause of the problem. //#include <stdio.h> //#include <stdlib.h> //#include <cuda_runtime.h> #include <iostream> #include <fstream> #include <sstream> #include <string> #include <cutil.h> __global__ void test_kernel(float *d_data) { int tid = blockDim.x*blockIdx.x + threadIdx.x; for

Can not save model using model.save following multi_gpu_model in Keras

懵懂的女人 提交于 2020-01-01 05:32:31
问题 Following the upgrade to Keras 2.0.9, I have been using the multi_gpu_model utility but I can't save my models or best weights using model.save('path') The error I get is TypeError: can’t pickle module objects I suspect there is some problem gaining access to the model object. Is there a work around this issue? 回答1: Workaround Here's a patched version that doesn't fail while saving: from keras.layers import Lambda, concatenate from keras import Model import tensorflow as tf def multi_gpu

How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-31 16:53:35
问题 I have a 4 GPU machine on which I run Tensorflow (GPU) with Keras. Some of my classification problems take several hours to complete. nvidia-smi returns Volatile GPU-Util which never exceeds 25% on any of my 4 GPUs. How can I increase GPU Util% and speed up my training? 回答1: If your GPU util is below 80%, this is generally the sign of an input pipeline bottleneck. What this means is that the GPU sits idle much of the time, waiting for the CPU to prepare the data: What you want is the CPU to

Distributed Tensorflow device placement in Google Cloud ML engine

瘦欲@ 提交于 2019-12-23 13:16:33
问题 I am running a large distributed Tensorflow model in google cloud ML engine. I want to use machines with GPUs. My graph consists of two main the parts the input/data reader function and the computation part. I wish to place variables in the PS task, the input part in the CPU and the computation part on the GPU. The function tf.train.replica_device_setter automatically places variables in the PS server. This is what my code looks like: with tf.device(tf.train.replica_device_setter(cluster

Distributed Tensorflow device placement in Google Cloud ML engine

一笑奈何 提交于 2019-12-23 13:16:17
问题 I am running a large distributed Tensorflow model in google cloud ML engine. I want to use machines with GPUs. My graph consists of two main the parts the input/data reader function and the computation part. I wish to place variables in the PS task, the input part in the CPU and the computation part on the GPU. The function tf.train.replica_device_setter automatically places variables in the PS server. This is what my code looks like: with tf.device(tf.train.replica_device_setter(cluster