问题
I have a jar file on all my Hadoop nodes at /home/ubuntu/libs/javacv-0.9.jar , with some other jar files.
When my MapReduce application is executing on Hadoop nodes, I am getting this exception
java.io.FileNotFoundException: File does not exist hdfs://192.168.0.18:50000/home/ubuntu/libs/javacv-0.9.jar
How can I resolve this exception? How can my jar running in Hadoop access 3rd party libraries from the local file system of the Hadoop node?
回答1:
You need to copy your file to HDFS and not to the local filesystem.
To copy files to HDFS you need to use:
hadoop fs -put localfile hdfsPath
Other option is to change the file path to:
file:///home/ubuntu/libs/javacv-0.9.jar
To add jar files to the classpath, take a look at DistributedCache:
DistributedCache.addFileToClassPath(new Path("file:///home/ubuntu/libs/javacv-0.9.jar"), job);
You may need to iterate over all jar files in that directory.
回答2:
Another option would be to use distributed cache's addFileToClassPath(new Path("/myapp/mylib.jar"), job); to submit the Jar files that should be added to the classpath of your mapper and reducer tasks.
Note: Make sure you copy the jar file to HDFS first.
You could even add jar files to class path by using hadoop command line argument -libjars <jar_file>.
Note: Make sure your MapReduce application implements ToolRunner to allow
-libjarsoption from command line.
来源:https://stackoverflow.com/questions/28213244/hadoop-accessing-3rd-party-libraries-from-local-file-system-of-a-hadoop-node