Failed to remotely execute R script which loads library “rhdfs”

南笙酒味 提交于 2019-12-04 05:11:01

问题


I'm working on a project using R-Hadoop, and got this problem.

I'm using JSch in JAVA to ssh to remote hadoop pseudo-cluster, and here are part of Java code to create connection.

/* Create a connection instance */
Connection conn = new Connection(hostname);
/* Now connect */
conn.connect();
/* Authenticate */
boolean isAuthenticated = conn.authenticateWithPassword(username, password);
if (isAuthenticated == false)
throw new IOException("Authentication failed.");
/* Create a session */
Session sess = conn.openSession();
//sess.execCommand("uname -a && date && uptime && who");
sess.execCommand("Rscript -e 'args1 <- \"Dell\"; args2 <- 1; source(\"/usr/local/R/mytest.R\")'");
//sess.execCommand("ls");
sess.waitForCondition(ChannelCondition.TIMEOUT, 50);

I tried several simple R scripts, and my codes worked fine. But when it comes to R-Hadoop, the R script will stop running. But if I run Rscript -e 'args1 <- "Dell"; args2 <- 1; source("/usr/local/R/mytest.R")' directly in remote server, everything works fine.

Here is what I got after taking Hong Ooi's suggestion: Instead of using Rscript, I used following command:

sess.execCommand("R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

And in the whathappened.txt, I got following error:

> args=(commandArgs(TRUE))
> for(i in 1:length(args)){
+      eval(parse(text=args[[i]]))
+ }
> source("/usr/local/R/main.R")
> main(args1,args2)
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 鈥榬hdfs鈥?
Execution halted

Well, now the problem is much clearer. Unfortunately, I'm pretty new to linux, and have no idea how to solve this.


回答1:


Well, I solved this problem like this:

sess.execCommand("source /etc/profile; R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

The problem was caused by environment. SSH to the remote Hadoop cluster actually uses a different environment, so variables like $HADOOP_CMD will not be discovered. There are multiple ways to let the SSH session know how to pick the environment variables.

In my method, the "source /etc/profile" can tell the sshed environment where to find the environment virables.




回答2:


Well, I just found another solution by myself:

Instead of caring about env from outside Hadoop cluster, can set env in R scripts like:

Sys.setenv(HADOOP_HOME="put your HADOOP_HOME path here")
Sys.setenv(HADOOP_CMD="put your HADOOP_CMD path here")

library(rmr2)
library(rhdfs)


来源:https://stackoverflow.com/questions/17583846/failed-to-remotely-execute-r-script-which-loads-library-rhdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!