Spark on embedded mode - user/hive/warehouse not found

允我心安 提交于 2019-12-07 01:59:28

问题


I'm using Apache Spark in embedded local mode. I have all the dependencies included in my pom.xml and in the same version (spark-core_2.10, spark-sql_2.10, and spark-hive_2.10).

I just want to run a HiveQL query to create a table (stored as Parquet).

Running the following (rather simple) code:

public class App {
    public static void main(String[] args) throws IOException, ClassNotFoundException {

        SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL").setMaster("local[2]").set("spark.executor.memory", "1g");
        JavaSparkContext ctx = new JavaSparkContext(sparkConf);
        HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx.sc());

        String createQuery = "CREATE TABLE IF NOT EXISTS Test (id int, name string) STORED AS PARQUET";
        sqlContext.sql(createQuery);
    }
}

...is returning the following exception:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/test is not a directory or unable to create one)

I can see the metastore_db folder created in the root of the project.

I searched around and the solutions found didn't help --most of them were not for the embedded mode.

  • One solution was to check the permissions, I'm using the same user for everything.
  • Another solution was to create the folder manually in HDFS, I did and I can navigate to /user/hive/warehouse/test.
  • One solution was to set manually the metastore by adding: sqlContext.sql("SET hive.metastore.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse");.

I'm running out of ideas right now, can someone provide any other suggestions?


回答1:


Because you're running in local embedded mode, HDFS is not being considered. This is why the error says file:/user/hive/warehouse/test rather than hdfs://localhost:9000/user/hive/warehouse/test. It expects /user/hive/warehouse/test to exist on your local machine. Try creating it locally.




回答2:


Just in case this helps anybody else in the future, I'm attempting to write some unit tests against Spark code that uses a HiveContext. I've found that in order to change the path where the files are written for the tests, I needed to call hiveContext.setConf. I also tried the same approach as OP, performing a SET query, but that didn't work. The following seems to work!

hive.setConf("hive.metastore.warehouse.dir", 
  "file:///custom/path/to/hive/warehouse")

And just to make this a tad more useful, I specifically set this path to a location my code had access to:

hive.setConf("hive.metastore.warehouse.dir", 
  getClass.getResource(".").toString)

With this, I've been able to write unit tests against my code making use of hive queries and the Spark API.



来源:https://stackoverflow.com/questions/31985728/spark-on-embedded-mode-user-hive-warehouse-not-found

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!