问题
I'm using Apache Spark in embedded local mode. I have all the dependencies included in my pom.xml and in the same version (spark-core_2.10, spark-sql_2.10, and spark-hive_2.10).
I just want to run a HiveQL query to create a table (stored as Parquet).
Running the following (rather simple) code:
public class App {
public static void main(String[] args) throws IOException, ClassNotFoundException {
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL").setMaster("local[2]").set("spark.executor.memory", "1g");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx.sc());
String createQuery = "CREATE TABLE IF NOT EXISTS Test (id int, name string) STORED AS PARQUET";
sqlContext.sql(createQuery);
}
}
...is returning the following exception:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/test is not a directory or unable to create one)
I can see the metastore_db
folder created in the root of the project.
I searched around and the solutions found didn't help --most of them were not for the embedded mode.
- One solution was to check the permissions, I'm using the same user for everything.
- Another solution was to create the folder manually in HDFS, I did and I can navigate to /user/hive/warehouse/test.
- One solution was to set manually the metastore by adding:
sqlContext.sql("SET hive.metastore.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse");
.
I'm running out of ideas right now, can someone provide any other suggestions?
回答1:
Because you're running in local embedded mode, HDFS is not being considered. This is why the error says file:/user/hive/warehouse/test
rather than hdfs://localhost:9000/user/hive/warehouse/test
. It expects /user/hive/warehouse/test
to exist on your local machine. Try creating it locally.
回答2:
Just in case this helps anybody else in the future, I'm attempting to write some unit tests against Spark code that uses a HiveContext. I've found that in order to change the path where the files are written for the tests, I needed to call hiveContext.setConf. I also tried the same approach as OP, performing a SET
query, but that didn't work. The following seems to work!
hive.setConf("hive.metastore.warehouse.dir",
"file:///custom/path/to/hive/warehouse")
And just to make this a tad more useful, I specifically set this path to a location my code had access to:
hive.setConf("hive.metastore.warehouse.dir",
getClass.getResource(".").toString)
With this, I've been able to write unit tests against my code making use of hive queries and the Spark API.
来源:https://stackoverflow.com/questions/31985728/spark-on-embedded-mode-user-hive-warehouse-not-found