how to set classpath for a Java program on hadoop file system

只愿长相守 提交于 2019-12-09 06:25:43

问题


I am trying to figure out how to set class path that reference to HDFS? I cannot find any reference.

 java -cp "how to reference to HDFS?" com.MyProgram 

If i cannot reference to hadoop file system, then i have to copy all the referenced third party libs/jars somewhere under $HADOOP_HOME on each hadoop machine...but i wanna avoid this by putting files to hadoop file system. Is this possible?

Example hadoop command line for the program to run (my expectation is like this, maybe i am wrong):

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.3.jar -input inputfileDir -output outputfileDir -mapper /home/nanshi/myprog.java -reducer NONE -file /home/nanshi/myprog.java

However, within the command line above, how do i added java classpath? like -cp "/home/nanshi/wiki/Lucene/lib/lucene-core-3.6.0.jar:/home/nanshi/Lucene/bin"


回答1:


What I suppose you are trying to do is include third party libraries in your distributed program. There are many options you can do.

Option 1) Easiest option that I find is to put all the jars in $HADOOP_HOME/lib (eg /usr/local/hadoop-0.22.0/lib) directory on all nodes and restart your jobtracker and tasktracker.

Option 2) Use libjars option command for this is hadoop jar -libjars comma_seperated_jars

Option 3) Include the jars in lib directory of the jar. You will have to do that while creating your jar.

Option 4) Install all the jars in your computer and include their location in class path.

Option 5) You can try by putting those jars in distributed cache.




回答2:


You cannot add to your classpath a HDFS path. The java executable wouldn't be able to interpret something like :

hdfs://path/to/your/file

But adding third party libraries to the classpath of each task needing those libraries can be done using the -libjars option. This means you need to have a so called driver class (implementing Tool) which sets up and starts your job and use the -libjars option on the command line when running that driver class. The Tool, in turn, uses GenericParser to parse your command line arguments (including -libjars) and with the help of the JobClient will do all the necessary work to send your lib to all the machines needing them and to set them on the classpath of those machines.

Besides that, in order to run a MR job you should use the hadoop script located in the bin/ directory of your distribution.

Here is an example (using a jar containing your job and the driver class):

 hadoop jar jarfilename.jar DriverClassInTheJar 
 -libjars comma-separated-list-of-libs <input> <output>



回答3:


You can specify the jar path as
-libjars hdfs://namenode/path_to_jar ,I have used this with Hive .



来源:https://stackoverflow.com/questions/11696563/how-to-set-classpath-for-a-java-program-on-hadoop-file-system

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!