R: making cluster in doParallel / snowfall hangs

感情迁移 提交于 2019-12-05 15:41:01

You could start by setting the "outfile" option to an empty string when creating the cluster object:

makePSOCKcluster("192.168.1.1",user="username",outfile="")

This allows you to see error messages from the workers in your terminal, which will hopefully provide a clue to the problem. If that doesn't help, I recommend using manual mode:

makePSOCKcluster("192.168.1.1",user="username",outfile="",manual=TRUE)

This bypasses ssh, and displays commands for you to execute in order to manually start each of the workers in separate terminals. This can uncover problems such as R packages that are not installed. It also allows you to debug the workers using whatever debugging tools you choose, although that takes a bit of work.

If makePSOCKcluster doesn't respond after you execute the specified command, it means that the worker wasn't able to connect to the master process. If the worker doesn't display any error message, it may indicate a networking problem, possibly due to a firewall blocking the connection. Since makePSOCKcluster uses a random port by default in R 3.X, you should specify an explicit value for port and configure your firewall to allow connections to that port.

To test for networking or firewall problems, you could try connecting to the master process using "netcat". Execute makePSOCKcluster in manual mode, specifying the hostname of the desired worker host and the port on local machine that should allow incoming connections:

> library(parallel)
> makePSOCKcluster("node03", port=11234, manual=TRUE)
Manually start worker on node03 with
   '/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=node01
PORT=11234 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE 

Now start a terminal session on "node03" and execute "nc" using the indicated values of "MASTER" and "PORT" as arguments:

node03$ nc node01 11234

The master process should immediately return with the message:

socket cluster with 1 nodes on host ‘node03’

while netcat should display no message, since it is quietly reading from the socket connection.

However, if netcat displays the message:

nc: getaddrinfo: Name or service not known

then you have a hostname resolution problem. If you can find a hostname that does work with netcat, you may be able to get makePSOCKcluster to work by specifying that name via the "master" option: makePSOCKcluster("node03", master="node01", port=11234).

If netcat returns immediately, that may indicate that it wasn't able to connect to the specified port. If it returns after a minute or two, that may indicate that it wasn't able to communicate with specified host at all. In either case, check netcat's return value to verify that it was an error:

node03$ echo $?
1

Hopefully that will give you enough information about the problem that you can get help from a network administrator.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!