TensorFlow 2-gpu slower then single gpu

前端未结

关注

 2  1764

I have two gpu (TitanX (Pascal) and GTX 1080). I am trying to run single-thread graph computation. The graph is two separate matrix multiplication chains (each assigned to c

相关标签:

2条回答

名媛妹妹

2020-12-11 08:12
Isn't it because you need to transfer data between the GPUs when you compute C ? Can you try putting C on cpu ?
```
with tf.device('/cpu:0'):
  C = tf.matmul(B1, B2)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

旧巷少年郎

2020-12-11 08:15

There's significant delay when launching kernel for the first time on a GPU, possibly caused by PTXAS compilation. This delay can be on the order of seconds and accumulates when you use more than 1 GPUs, so in your case the run is slower because time is dominated by an extra "initial kernel launch". One way to benchmark pure computation time is to to "pre-warming" by executing each cuda operation at least once on each GPU. I've observed the same slowness by running your benchmark on 2 TitanX cards, but this delay disappeared when I "pre-warmed" the kernels.

Here's before pre-warming:

Here's after pre-warming: Below is your code modified to do pre-warming, and also to remove any TensorFlow<->Python transfers.

import tensorflow as tf

from tensorflow.python.ops import init_ops
from tensorflow.python.client import timeline
import logging, time
import numpy as np

def test():
    n = 5000

    with tf.device('/gpu:0'):
        A1 = tf.Variable(tf.ones_initializer(shape=[n, n]), name='A1')
        B1 = A1
        for l in xrange(10):
            B1 = tf.matmul(A1, B1, name="chain1")

    with tf.device('/gpu:1'):
        A2 = tf.Variable(tf.ones_initializer(shape=[n, n]), name='A2')
        B2 = A2
        for l in xrange(10):
            B2 = tf.matmul(A2, B2, name="chain2")
        C = tf.matmul(B1, B2)

    run_metadata = tf.RunMetadata()
    start = time.time()
    logging.info('started')
    sess = tf.InteractiveSession(config=tf.ConfigProto(allow_soft_placement=False, log_device_placement=True))
    sess.run(tf.initialize_all_variables())
    # do warm-run
    sess.run([C.op],
             options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
             run_metadata=run_metadata)
    run_metadata = tf.RunMetadata()
    sess.run([C.op],
             options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
             run_metadata=run_metadata)
    logging.info('writing trace')
    trace = timeline.Timeline(step_stats=run_metadata.step_stats)
    trace_file = open('timeline.ctf.json', 'w')
    trace_file.write(trace.generate_chrome_trace_format(show_memory=True))
    logging.info('trace written')
    end = time.time()
    logging.info('computed')
    logging.info(end - start)


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
    test()

0 讨论(0)