Unable to get parquet-tools working from the command-line

ⅰ亾dé卋堺 提交于 2019-12-06 04:55:58

问题


I'm attempting to get the newest version of parquet-tools running, but I'm having some issues. For some reason org.apache.hadoop.conf.Configuration isn't in the shaded jar. (I have the same issue with v1.6.0 as well).

Is there something beyond mvn package or mvn install that I should be doing? (The actual mvn invocation I'm using is mvn install -DskipTests -pl \!parquet-thrift,\!parquet-cascading,\!parquet-pig-bundle,\!parquet-pig,\!parquet-scrooge,\!parquet-hive,\!parquet-protobuf). This works just fine, and the tests pass if I choose to run them.

The error I get is below (You can see I've attempted to stick the hadoop jar from an old parquet version that seemed to bundle it into the classpath; I get the same results with or without it).

> java -classpath /path/to/hadoop-core-1.1.0.jar -jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet

java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
    at parquet.tools.command.ShowMetaCommand.execute(ShowMetaCommand.java:59)
    at parquet.tools.Main.main(Main.java:222)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 2 more
org/apache/hadoop/conf/Configuration

回答1:


On MacOS using homebrew, this is the easiest way to get started:

$ brew install parquet-tools



回答2:


You can also include hadoop dependencies into the target jar:

mvn clean package -Plocal -DskipTests -Dhadoop.scope=compile




回答3:


If you have hadoop installed, change your command to be hadoop jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet instead.




回答4:


This set of steps from the parquet-mr issues list fixed the same issue for me:

mvn install
cd parquet-tools
mvn clean package -Plocal
mvn install
mvn dependency:copy-dependencies
# replace 1.8.2 in the next step with the version you're using
cp target/parquet-tools-1.8.2-SNAPSHOT.jar target/dependency/
mkdir -p ~/local/bin/lib
cp target/dependency/* ~/local/bin/lib/
cp src/main/scripts/* ~/local/bin/
echo export PATH=$PATH:~/local/bin >> .profile



回答5:


I ran into a similar issue and fixed it by specifying the "local" profile:

mvn clean package -Plocal

I had originally missed this paragraph, but it's explained that if you want to mix in Hadoop dependencies, the "local" profile does so, as opposed to the default where you're expected to use it somewhere Hadoop is already installed and present on your classpath:

https://github.com/Parquet/parquet-mr/tree/master/parquet-tools



来源:https://stackoverflow.com/questions/29724629/unable-to-get-parquet-tools-working-from-the-command-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!