Run MapReduce Job from a web application

问题

With reference to similar questions: Running a Hadoop Job From another Java Program and Calling a mapreduce job from a simple java program

I too have a mapreduce job jar file in a Hadoop remote machine, and I'm creating a web application that, with a button click event, will call out to the jar file and execute the job. This web app is running on a separate machine.

I've tried the suggestions from both of the posts above but could not get it to work, even working on the wordcount example provided, but still encountering the error message NoClassDefFoundError.

Is there any lines of code I'm missing?

Below is the code i have:

public void buttonClick(ClickEvent event) {
        UserGroupInformation ugi;
        try {
            ugi = UserGroupInformation.createProxyUser("hadoopUser", UserGroupInformation.getLoginUser());
            ugi.doAs(new PrivilegedExceptionAction<Object>(){
                public Object run() throws Exception {
                    runHadoopJob();
                    return null;
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }   
    }

private boolean runHadoopJob(){
try {       
            Configuration conf = new Configuration();
            conf.set("fs.default.name", "hdfs://192.168.4.248:9000");
            conf.set("mapred.job.tracker", "192.168.4.248:9001");
            Job job = new Job(conf, "WordCount");
            job.setMapperClass(TokenizerMapper.class);
            job.setReducerClass(IntSumReducer.class);
            job.setJarByClass(TokenizerMapper.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path("/flume/events/160114/*"));
            Path out = new Path("output");
            FileSystem fs = FileSystem.get(conf);
            fs.delete(out, true);
            FileOutputFormat.setOutputPath(job, out);
            job.waitForCompletion(true);
            System.out.println("Job Finished");
        } catch (Exception e) {
            e.printStackTrace();
        }
return true;
}

Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapreduce.Job.connect(Job.java:511) at org.apache.hadoop.mapreduce.Job.submit(Job.java:499) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at com.example.hadoopclient.HDFSTable.runHadoopJob(HDFSTable.java:181) at com.example.hadoopclient.HDFSTable.access$0(HDFSTable.java:120) at com.example.hadoopclient.HDFSTable$SearchButtonClickListener.buttonClick(HDFSTable.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:510) ... 36 more

Added the following to my hadoop core-site.xml file, where hadoop is the usergroup my hadoopUser belongs to

<property>
           <name>hadoop.proxyuser.kohtianan.groups</name>
           <value>hadoop</value>
           <description></description>
         </property>
         <property>
           <name>hadoop.proxyuser.kohtianan.hosts</name>
           <value>*</value>
           <description></description>
         </property>

回答1:

For map-reduce program to run, you need to have jackson-mapper-asl-*.jar and jackson-core-asl-*.jar files present on your map-reduce program class-path. The actual jar file names will vary based on the hadoop distribution and version you are using.

These files are present under $HADOOP_HOME/lib folder. Two ways to solve this problem:

Invoke map-reduce program using hadoop jar command. This will ensure that all the required jar files are automatically included in your map-reduce program's class-path.
If you wish to trigger a map-reduce job from your application, make sure you include these jar files (and other necessary jar files) in your application class-path, so that when you spawn a map-reduce program it automatically picks up the jar files from the application class-path.

org.apache.hadoop.ipc.RemoteException: User: kohtianan is not allowed to impersonate hadoopUser

This error indicates that the user kohtianan does not have access to Hadoop DFS. What you can do is, just create a directory on HDFS (from hdfs superuser) and change the owner of that directory to kohtianan. This should resolve your issue.

来源：https://stackoverflow.com/questions/21180120/run-mapreduce-job-from-a-web-application

标签

java

Hadoop

MapReduce

HDFS