Hive on Google Cloud wants permissions on /tmp, but no way to change permissions

坚强是说给别人听的谎言 提交于 2019-12-06 16:18:41

For the time being, the GCS connector for Hadoop doesn't support fine-grained HDFS permissions, and thus the reported 700 is "fake"; in fact, permissions are controlled via ACLs, and if using the service account with read/write access, any linux user in the authenticated GCE VM is in fact able to read/write/execute on all files inside GCS.

It appears Hive 0.14.0 freshly introduces an unfortunate attempt to check for a minimum permission of 733 on the root dir, even though if it just ignored the permissions, accessibility would have worked out just fine. Unfortunately, for the moment, the "required permissions" isn't configurable in Hive's SessionState, nor is it configurable in the GCS connector for Hadoop; in a future release, we can potentially provide a config setting for the GCS connector for Hadoop to specify what permissions to report, and/or implement full fine-grained posix permissions on all directories.

In the meantime, it appears Hive 0.13.0 doesn't have the same unfortunate check, so if you're okay with the slightly older Hive version, it should work just fine.

Important: That said, note that the "click to deploy" solution doesn't currently officially support Pig or Hive, in part because it doesn't yet set up the more advanced "NFS consistency cache" introduced in gcs-connector-1.3.0/bdutil-0.36.4, with automated setup of the list-consistency cache. Without the list-consistency cache, Hive and Pig may unexpectedly lose data since they rely on "ls" to commit temporary files.

Your best bet is actually to download the latest bdutil-1.1.0 and use that instead; it supports Pig and Hive with:

./bdutil -e querytools deploy

or equivalently:

./bdutil -e extensions/querytools/querytools_env.sh deploy

Inside that querytools_env.sh file, you'll find:

# URIs of tarballs to install.
PIG_TARBALL_URI='gs://querytools-dist/pig-0.12.0.tar.gz'
HIVE_TARBALL_URI='gs://querytools-dist/hive-0.12.0-bin.tar.gz'

Where you may optionally upload your own Hive version to your own bucket and modify HIVE_TARBALL_URI for bdutil to pick it up. Hive 0.14.0 still won't work, but you might have luck with Hive 0.13.0. Alternatively, if you don't care about the version too much, the default Hive 0.12.0 receives continuous testing and validation from Google's engineering teams, so you'll have a better-validated experience. You can also view bdutil's contents on GitHub if you wish, at https://github.com/GoogleCloudPlatform/bdutil

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!