I'm trying to run Hive on a Google Cloud, where Hadoop was installed by click-to-deploy. Hive seems to instal just fine, but when I run hive
I get the following erroneous output:
Logging initialized using configuration in jar:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/l
ib/hive-common-0.14.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!
/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/michael_w_sherman_gmail_com/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.
0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/
hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current per
missions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:529)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:478)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
... 7 more
My first fix was to check hdfs-site.xml
and change the dfs.permissions.enabled
setting, but it was already set to false
. Next, I tried to chmod
the permissions. But the chmod changes don't take.
$ hadoop fs -ls
15/01/28 23:03:13 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2
Found 8 items ....
drwx------ - xxxx_gmail_com xxxx_gmail_com 0 2015-01-28 21:54 tmp
$ hadoop fs -chmod -R 777 /tmp
15/01/28 23:03:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2
$ hadoop fs -ls 15/01/28 23:09:35 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2
Found 8 items ....
drwx------ - xxx_gmail_com xxx_gmail_com 0 2015-01-28 21:54 tmp
Different chmod
options, like a+w
, fail to change the permissions. And the owner/group of the file is always equal to the ssh user (the log above is from an ssh terminal launched from Google Cloud's console, which uses your email as a username). But I have the same problem when I ssh in.
How do I either change the permissions or get Hive to not give the error?
Thank you.
For the time being, the GCS connector for Hadoop doesn't support fine-grained HDFS permissions, and thus the reported 700 is "fake"; in fact, permissions are controlled via ACLs, and if using the service account with read/write access, any linux user in the authenticated GCE VM is in fact able to read/write/execute on all files inside GCS.
It appears Hive 0.14.0 freshly introduces an unfortunate attempt to check for a minimum permission of 733 on the root dir, even though if it just ignored the permissions, accessibility would have worked out just fine. Unfortunately, for the moment, the "required permissions" isn't configurable in Hive's SessionState, nor is it configurable in the GCS connector for Hadoop; in a future release, we can potentially provide a config setting for the GCS connector for Hadoop to specify what permissions to report, and/or implement full fine-grained posix permissions on all directories.
In the meantime, it appears Hive 0.13.0 doesn't have the same unfortunate check, so if you're okay with the slightly older Hive version, it should work just fine.
Important: That said, note that the "click to deploy" solution doesn't currently officially support Pig or Hive, in part because it doesn't yet set up the more advanced "NFS consistency cache" introduced in gcs-connector-1.3.0/bdutil-0.36.4, with automated setup of the list-consistency cache. Without the list-consistency cache, Hive and Pig may unexpectedly lose data since they rely on "ls" to commit temporary files.
Your best bet is actually to download the latest bdutil-1.1.0 and use that instead; it supports Pig and Hive with:
./bdutil -e querytools deploy
or equivalently:
./bdutil -e extensions/querytools/querytools_env.sh deploy
Inside that querytools_env.sh
file, you'll find:
# URIs of tarballs to install.
PIG_TARBALL_URI='gs://querytools-dist/pig-0.12.0.tar.gz'
HIVE_TARBALL_URI='gs://querytools-dist/hive-0.12.0-bin.tar.gz'
Where you may optionally upload your own Hive version to your own bucket and modify HIVE_TARBALL_URI
for bdutil
to pick it up. Hive 0.14.0 still won't work, but you might have luck with Hive 0.13.0. Alternatively, if you don't care about the version too much, the default Hive 0.12.0 receives continuous testing and validation from Google's engineering teams, so you'll have a better-validated experience. You can also view bdutil's contents on GitHub if you wish, at https://github.com/GoogleCloudPlatform/bdutil
来源:https://stackoverflow.com/questions/28204269/hive-on-google-cloud-wants-permissions-on-tmp-but-no-way-to-change-permissions