how to set classpath for a Java program on hadoop file system

一笑奈何 提交于 2019-12-03 08:21:47

What I suppose you are trying to do is include third party libraries in your distributed program. There are many options you can do.

Option 1) Easiest option that I find is to put all the jars in $HADOOP_HOME/lib (eg /usr/local/hadoop-0.22.0/lib) directory on all nodes and restart your jobtracker and tasktracker.

Option 2) Use libjars option command for this is hadoop jar -libjars comma_seperated_jars

Option 3) Include the jars in lib directory of the jar. You will have to do that while creating your jar.

Option 4) Install all the jars in your computer and include their location in class path.

Option 5) You can try by putting those jars in distributed cache.

You cannot add to your classpath a HDFS path. The java executable wouldn't be able to interpret something like :

hdfs://path/to/your/file

But adding third party libraries to the classpath of each task needing those libraries can be done using the -libjars option. This means you need to have a so called driver class (implementing Tool) which sets up and starts your job and use the -libjars option on the command line when running that driver class. The Tool, in turn, uses GenericParser to parse your command line arguments (including -libjars) and with the help of the JobClient will do all the necessary work to send your lib to all the machines needing them and to set them on the classpath of those machines.

Besides that, in order to run a MR job you should use the hadoop script located in the bin/ directory of your distribution.

Here is an example (using a jar containing your job and the driver class):

 hadoop jar jarfilename.jar DriverClassInTheJar 
 -libjars comma-separated-list-of-libs <input> <output>

You can specify the jar path as
-libjars hdfs://namenode/path_to_jar ,I have used this with Hive .

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!