hadoop, how to include 3part jar while try to run mapred job

*爱你&永不变心* 提交于 2019-12-24 08:18:52

问题


As we know, new need to pack all needed class into the job-jar and upload it to server. it's so slow, i will to know whether there is a way which to specify the thirdpart jar include executing map-red job, so that i could only pack my classes with out dependencies.

PS(i found there is a "-libjar" command, but i doesn't figure out how to use it. Here is the link http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/)


回答1:


Those are called generic options. So, to support those, your job should implement Tool.

Run your job like --

hadoop jar yourfile.jar [mainClass] args -libjars <comma seperated list of jars>

Edit:

To implement Tool and extend Configured, you do something like this in your MapReduce application --

public class YourClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new YourClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {
        //parse you normal arguments here.

        Configuration conf = getConf();
        Job job = new Job(conf, "Name of job");

        //set the class names etc

        //set the output data type classes etc

        //to accept the hdfs input and outpur dir at run time
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        return job.waitForCompletion(true) ? 0 : 1;
    }
}



回答2:


For me I had to specify -libjar option before the arguments. Otherwise it was considered as an argument.



来源:https://stackoverflow.com/questions/19029760/hadoop-how-to-include-3part-jar-while-try-to-run-mapred-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!