问题
For example I have few Hive HQL statements which I want to pass into Spark SQL:
set parquet.compression=SNAPPY;
create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE;
select * from MY_TABLE limit 5;
Following doesn't work:
hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;")
How to pass the statements into Spark SQL?
回答1:
Thank you to @SamsonScharfrichter for the answer.
This will work:
hiveContext.sql("set spark.sql.parquet.compression.codec=SNAPPY")
hiveContext.sql("create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE")
val rs = hiveContext.sql("select * from MY_TABLE limit 5")
Please note that in this particular case instead of parquet.compression key we need to use spark.sql.parquet.compression.codec
回答2:
I worked on a scenario where i needed to read a sql file and run all the; separated queries present in that file.
One simple way to do it is like this:
val hsc = new org.apache.spark.sql.hive.HiveContext(sc)
val sql_file = "/hdfs/path/to/file.sql"
val file = sc.wholeTextFiles(s"$sql_file")
val queries = f.take(1)(0)._2
Predef.refArrayOps(queries.split(';')).map(query => hsc.sql(query))
来源:https://stackoverflow.com/questions/36938399/how-to-pass-multiple-statements-into-spark-sql-hivecontext