Problem with Hadoop Streaming -file option for Java class files

泄露秘密 提交于 2019-12-07 18:58:01

问题


I am struggling with a very basic issue in hadoop streaming in the "-file" option.

First I tried the very basic example in streaming:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstchk22

which worked absolutely fine.

Then I copied the IdentityMapper.java source code and compiled it. Then I placed this class file in the /home/hadoop folder and executed the following in the terminal.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file ~/IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch6

The execution failed with the following error in the stderr file:

java.io.IOException: Cannot run program "IdentityMapper.class": java.io.IOException: error=2, No such file or directory

Then again I tried it by copying the IdentityMapper.class file in the hadoop installation and executed the following:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch5

But unfortunately again I got the same error.

It would be great if you can help me with it as I cannot move any further without overcoming this.

Thanking you in anticipation.


回答1:


Why do you want to compile the class? It is already compiled in the hadoop jars. You are just passing the classname (org.apache.hadoop.mapred.lib.IdentityMapper), because Hadoop uses reflection to instantiate a new instance of this mapping class.

You have to make sure that this is lying in the classpath e.g. within a jar you are passing the job.




回答2:


Same answer as for your other question, you can't really use -file to send over jars as hadoop doesn't support multiple jars (that were not already in the CLASSPATH), check the streaming docs:

At least as late as version 0.14, Hadoop does not support multiple jar files. So, when specifying your own custom classes you will have to pack them along with the streaming jar and use the custom jar instead of the default hadoop streaming jar.




回答3:


I met similar problem. And adding the jar file to HADOOP_CLASSPATH fixed the issue. More info please refer this: http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/



来源:https://stackoverflow.com/questions/6790110/problem-with-hadoop-streaming-file-option-for-java-class-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!