How to set hive.metastore.warehouse.dir in HiveContext?

妖精的绣舞 提交于 2019-12-05 09:42:10

According to http://spark.apache.org/docs/latest/sql-programming-guide.html#sql

Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.

tl;dr Set hive.metastore.warehouse.dir while creating a SQLContext (or SparkSession).

The location of the default database for the Hive metastore warehouse is /user/hive/warehouse by default. It used to be set using hive.metastore.warehouse.dir Hive-specific configuration property (in a Hadoop configuration).

It's been a while since you asked this question (it's Spark 2.3 days), but that part has not changed since - if you use sql method of SQLContext (or SparkSession these days), it's simply too late to change where Spark creates the metastore database. It is far too late as the underlying infrastructure has been set up already (so you can use the SQLContext). The warehouse location has to be set up before the HiveContext / SQLContext / SparkSession initialization.

You should set hive.metastore.warehouse.dir while creating SparkSession (or SQLContext before Spark SQL 2.0) using config and (very important) enable the Hive support using enableHiveSupport.

config(key: String, value: String): Builder Sets a config option. Options set using this method are automatically propagated to both SparkConf and SparkSession's own configuration.

enableHiveSupport(): Builder Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.

You could use hive-site.xml configuration file or spark.hadoop prefix, but I'm digressing (and it strongly depends on the current configuration).

Another option is to just create a new database and then USE new_DATATBASE and then create the table. The warehouse will be created under the folder you ran the sql-spark.

I faced exactly theI faced exactly the same issues. I was running spark-submit command in shell action via oozie.

Setting warehouse directory doesn't worked for me while creating sparksession

All you need to do is to pass the pass the hive-site.xml in spark-submit command using below property:

--files ${location_of_hive-site.xml}

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!