Hadoop distributed cache : using -libjars : How to use external jars in your code

非 Y 不嫁゛ 提交于 2019-12-13 04:35:38

问题


Okay I am able to add external jars to my code using ilibjars path. Now how to use those external jars in my code. say I have a function defined in that jar which operates on String. How to use it. using context.getArchiveClassPaths(), i can get a path to it but i don't know how to instantiate that object.

here is the sample jar class that i am importing

package replace;

public class ReplacingAcronyms {

    public static String Replace(String abc){
        String n;
        n="This is trial";
        return n;
}

}





public class wc_runner extends Configured implements Tool {
        @Override
        public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        Job job = new Job(new Configuration());
        job.setJarByClass(wc_runner.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(wc_mapper.class);
        job.setCombinerClass(wc_reducer.class);
        job.setReducerClass(wc_reducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        return (job.waitForCompletion(true)?0:1);


        }        

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new wc_runner(), args);
        System.exit(exitCode);
    }
}

commands ran

[training@localhost Desktop]$ export HADOOP_CLASSPATH=file:///home/training/Desktop/replace.jar 
[training@localhost Desktop]$ hadoop jar try1.jar wc_runner /user/training/MR/custom/trial1 /user/training/MR/custom/out -libjars ./replace.jar

error

14/03/08 02:39:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the    arguments. Applications should implement Tool for the same.
14/03/08 02:39:41 INFO input.FileInputFormat: Total input paths to process : 1
14/03/08 02:39:41 WARN snappy.LoadSnappy: Snappy native library is available
14/03/08 02:39:41 INFO snappy.LoadSnappy: Snappy native library loaded
14/03/08 02:39:41 INFO mapred.JobClient: Running job: job_201403080114_0021
14/03/08 02:39:42 INFO mapred.JobClient:  map 0% reduce 0%
14/03/08 02:39:46 INFO mapred.JobClient: Task Id : attempt_201403080114_0021_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: replace.ReplacingAcronyms
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190

回答1:


Import your package to your mapred code then add your jar file's path in HADOOP_CLASSPATH before running your mapred job.

E.g. In your mapred java

import your.external.package;

On compilation

javac -cp /path/to/your/external/package.jar:...

On running the hadoop jar

export HADOOP_CLASSPATH=/path/to/your/external/package.jar
hadoop jar yourmapred.jar your.class -libjar /path/to/your/external/package.jar ....


来源:https://stackoverflow.com/questions/22245425/hadoop-distributed-cache-using-libjars-how-to-use-external-jars-in-your-cod

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!