问题
I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"
In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).
However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.
job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);
I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.
So how can I deal with this problem?
回答1:
hadoop
is only a shell script. Eventually, hadoop jar
will invoke org.apache.hadoop.util.RunJar
. What hadoop jar
do is helping you set up the CLASSPATH
. So you can use it directly.
For example,
String input = "...";
String output = "...";
org.apache.hadoop.util.RunJar.main(
new String[]{"Some.jar", "ClassWithMain", input, output});
However, you need to set the CLASSPATH
correctly before you use it. A convenient way to get the correct CLASSPATH
is hadoop classpath
. Type this command and you will get the full CLASSPATH
.
Then set up the CLASSPATH
before you run your java application. For example,
export CLASSPATH=$(hadoop classpath):$CLASSPATH
java -jar YourJar.jar
来源:https://stackoverflow.com/questions/18394663/send-executable-jar-to-hadoop-cluster-and-run-as-hadoop-jar