ClassNotFoundException running GiraphRunner on a modified SimpleShortestPathsVertex

♀尐吖头ヾ 提交于 2019-12-08 03:47:19

问题


I'm relatively new to Giraph and I'm trying to get my Giraph edit-compile-deploy loop working for our code. I am able to run various examples inspired by http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , but I'm stuck with a ClassNotFoundException when running my modified version of the SimpleShortestPathsVertex Giraph example. I've tried various combinations of -libjars and HADOOP_CLASSPATH, but I'm out of ideas and I'd really appreciate your help. Details follow.

Versions

  • Hadoop: Hadoop 2.0.0-cdh4.4.0
  • Giraph: giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar

The PageRankBenchmark runs OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)

The GiraphRunner SimpleShortestPathsVertex also runs OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1

...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)

Bonus: the results are correct:

$ hadoop fs -cat goutput/shortestpathsC2/p*
0   1.0
2   2.0
1   0.0
3   1.0
4   5.0

But my modified version of SimpleShortestPathsVertex gets ClassNotFoundException

The jar containing the modified vertex (KdlSimpleShortestPathsVertex, no package) is OK:

$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/

But my run pukes:

$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1

Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

My best guess ...

...after looking around is that maybe GiraphRunner is not processing the -libjars correctly, as hinted at by http://grepalex.com/2013/02/25/hadoop-libjars/ ("Make sure your code is using GenericOptionsParser"). Browsing the Giraph source, I do not see that class accessed. I tried setting HADOOP_CLASSPATH to my jar, but that didn't solve the problem.

Any help would be awesome!

PageRankBenchmark output

14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient:   File System Counters
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient:   Job Counters 
14/08/01 11:42:44 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient:     Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient:     Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient:     CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient:     Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient:     Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient:     Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient:     Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient:     Total (milliseconds)=3442

SimpleShortestPathsVertex output

14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient:   File System Counters
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient:   Job Counters 
14/08/01 11:47:46 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient:     Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient:     Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient:     CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient:     Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient:     Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient:     Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient:     Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient:     Total (milliseconds)=805

回答1:


OK, after looking at the hadoop scripts along with Hadoop and Giraph source, I think I figured it out. The big hint came from Using the libjars option with Hadoop along with this line from the output:

WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

The cause appears to be that GiraphRunner uses its own ConfigurationUtils.parseArgs() to get the org.apache.commons.cli.CommandLine instead of using the recommended org.apache.hadoop.util.GenericOptionsParser.getCommandLine(), which honors the 'libjars' option. This led me to fall back on Hadoop's generic classpath-handling tools: CLASSPATH and/or HADOOP_CLASSPATH. Here's what worked:

  • Set HADOOP_CLASSPATH to include your application jar and the gigraph core jar, using a colon delimiter.
  • Pass -libjars using that same classpath but with a comma delimiter.

For example, on my machine:

$ export GIRAPH_HOME=/share/apps/giraph
$ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH
$ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar
$ hadoop fs -rm -R goutput/shortestpathsC2
$ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ${LIBJARS} \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
$ hadoop fs -cat goutput/shortestpathsC2/p*

Which gives the expected output and results.

More generally, it would be helpful if the Giraph team changed the code to use the (apparently) more standard parser.

Hope that helps!




回答2:


I don't know why this isn't working but there is a quick-and-dirty way to fix this. Try putting your code in giraph-examples/src/main/java/org/apache/giraph/examples/ directory (where SimpleShortestPath is located). And then build giraph-examples jar by running mvn -DskipTests --projects giraph-examples --also-make package. Then simply run your program as you did for SimpleShortestPath replacing SimpleShortestPath by your file name. I hope that helps.



来源:https://stackoverflow.com/questions/25084629/classnotfoundexception-running-giraphrunner-on-a-modified-simpleshortestpathsver

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!