Can TensorFlow run with multiple CPUs (no GPUs)?

只谈情不闲聊 提交于 2020-01-01 03:12:09

问题


I'm trying to learn distributed TensorFlow. Tried out a piece code as explained here:

with tf.device("/cpu:0"):
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))

with tf.device("/cpu:1"):
    y = tf.nn.softmax(tf.matmul(x, W) + b)
    loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

Getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:CPU:1"](Placeholder, Variable/read)]]

Meaning that TensorFlow does not recognize CPU:1.

I'm running on a RedHat server with 40 CPUs (cat /proc/cpuinfo | grep processor | wc -l).

Any ideas?


回答1:


Following the link in the comment:

Turns out the session should be configured to have device count > 1:

config = tf.ConfigProto(device_count={"CPU": 8})
with tf.Session(config=config) as sess:
   ...

Kind of shocking that I missed something so basic, and no one could pinpoint to an error which seems too obvious.

Not sure if it's a problem with me or the TensorFlow code samples and documentation. Since it's Google, I'll have to say that it's me.




回答2:


First, just run it on "one CPU", and see if Tensorflow distributes threads to all of the CPUs appropriately. It likely will multithread correctly and you won't have to do anything.

In the case where it doesn't, you should try launching multiple Tensorflow instances with different CPU affinities, and doing a "distributed" system. Tensorflow has distributed services for multiple machines; it should work as well with separate processes on one machine, as long as you correctly set up your files so that they aren't writing to the same locations. You can get start at https://www.tensorflow.org/deploy/distributed . You might want to set the CPU affinities so that it's one process per physical CPU, a-la https://askubuntu.com/questions/102258/how-to-set-cpu-affinity-to-a-process



来源:https://stackoverflow.com/questions/45985641/can-tensorflow-run-with-multiple-cpus-no-gpus

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!