Spark on embedded mode - user/hive/warehouse not found

问题

I'm using Apache Spark in embedded local mode. I have all the dependencies included in my pom.xml and in the same version (spark-core_2.10, spark-sql_2.10, and spark-hive_2.10).

I just want to run a HiveQL query to create a table (stored as Parquet).

Running the following (rather simple) code:

public class App {
    public static void main(String[] args) throws IOException, ClassNotFoundException {

        SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL").setMaster("local[2]").set("spark.executor.memory", "1g");
        JavaSparkContext ctx = new JavaSparkContext(sparkConf);
        HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx.sc());

        String createQuery = "CREATE TABLE IF NOT EXISTS Test (id int, name string) STORED AS PARQUET";
        sqlContext.sql(createQuery);
    }
}

...is returning the following exception:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/test is not a directory or unable to create one)

I can see the metastore_db folder created in the root of the project.

I searched around and the solutions found didn't help --most of them were not for the embedded mode.

One solution was to check the permissions, I'm using the same user for everything.
Another solution was to create the folder manually in HDFS, I did and I can navigate to /user/hive/warehouse/test.
One solution was to set manually the metastore by adding: sqlContext.sql("SET hive.metastore.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse");.

I'm running out of ideas right now, can someone provide any other suggestions?

回答1:

Because you're running in local embedded mode, HDFS is not being considered. This is why the error says file:/user/hive/warehouse/test rather than hdfs://localhost:9000/user/hive/warehouse/test. It expects /user/hive/warehouse/test to exist on your local machine. Try creating it locally.

回答2:

Just in case this helps anybody else in the future, I'm attempting to write some unit tests against Spark code that uses a HiveContext. I've found that in order to change the path where the files are written for the tests, I needed to call hiveContext.setConf. I also tried the same approach as OP, performing a SET query, but that didn't work. The following seems to work!

hive.setConf("hive.metastore.warehouse.dir", 
  "file:///custom/path/to/hive/warehouse")

And just to make this a tad more useful, I specifically set this path to a location my code had access to:

hive.setConf("hive.metastore.warehouse.dir", 
  getClass.getResource(".").toString)

With this, I've been able to write unit tests against my code making use of hive queries and the Spark API.

来源：https://stackoverflow.com/questions/31985728/spark-on-embedded-mode-user-hive-warehouse-not-found

标签

Hadoop

apache-spark

Hive

apache-spark-sql

parquet