Airflow HiveCliHook connection to remote hive cluster?

拟墨画扇 提交于 2019-11-29 15:38:13
  • While you can use the HiveCliOperator (unaltered) for connecting and executing HQL statements in remote Hive-Server, the only requirement is that the box that is running your Airflow worker must also contain Hive binaries installed

  • This is so because the hive-cli command prepared by HiveCliHook would be run in worker machine via good-old bash. At this stage, if Hive CLI is not installed in the machine where this code is running (i.e. your Airflow worker), it will break as in your case


Straight-forward workaround is to implement your own RemoteHiveCliOperator that

  • Creates an SSHHook to the remote Hive-server machine
  • And execute your HQL statement via SSHHook like this

In fact this seems to be a universal drawback with almost all Airflow Operators that by default they expect requisite packages installed in every worker. The docs warn about it

For example, if you use the HiveOperator, the hive CLI needs to be installed on that box

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!