可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

we are trying to replicate an oracle db into hive. We get the queries from oracle and run them in hive. So, we get them in this format:

INSERT INTO schema.table(col1,col2) VALUES ('val','val');

While this query works in Hive directly, when I use spark.sql, I get the following error:

org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'emp_id' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 20) == SQL == insert into ss.tab(emp_id,firstname,lastname) values ('1','demo','demo') --------------------^^^         at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)         at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)         at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)         at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)         at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)         at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)         at com.datastream.SparkReplicator.insertIntoHive(SparkReplicator.java:20)         at com.datastream.App.main(App.java:67)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:498)         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

回答1:

This error is coming as Spark SQL does not support column lists in the insert statement. so exclude the column list from the insert statement.

below was my hive table:

select * from UDB.emp_details_table; +---------+-----------+-----------+-------------------+--+ | emp_id  | emp_name  | emp_dept  | emp_joining_date  | +---------+-----------+-----------+-------------------+--+ | 1       | AAA       | HR        | 2018-12-06        | | 1       | BBB       | HR        | 2017-10-26        | | 2       | XXX       | ADMIN     | 2018-10-22        | | 2       | YYY       | ADMIN     | 2015-10-19        | | 2       | ZZZ       | IT        | 2018-05-14        | | 3       | GGG       | HR        | 2018-06-30        | +---------+-----------+-----------+-------------------+--+

here I am inserting record using spark sql through pyspark

df = spark.sql("""insert into UDB.emp_details_table values ('6','VVV','IT','2018-12-18')""");

you could see below that given record has been inserted to my existing hive table.

+---------+-----------+-----------+-------------------+--+ | emp_id  | emp_name  | emp_dept  | emp_joining_date  | +---------+-----------+-----------+-------------------+--+ | 1       | AAA       | HR        | 2018-12-06        | | 1       | BBB       | HR        | 2017-10-26        | | 2       | XXX       | ADMIN     | 2018-10-22        | | 2       | YYY       | ADMIN     | 2015-10-19        | | 2       | ZZZ       | IT        | 2018-05-14        | | 3       | GGG       | HR        | 2018-06-30        | | 6       | VVV       | IT        | 2018-12-18        | +---------+-----------+-----------+-------------------+--+

change your spark sql query as :

spark.sql("""insert into ss.tab values ('1','demo','demo')""");

Note: I am using spark 2.3, you need to use hive context in case you are using spark 1.6 version.

Let me know if it works.

文章来源: Spark sql issue with columns specified

标签

apache

emp