why does spark appends 'WHERE 1=0' at the end of sql query

问题

I am trying to execute a simple mysql query using Apache Spark and create a data frame. But for some reasons spark appends 'WHERE 1=0' at the end of the query which I want to execute and throws an exception stating 'You have an error in your SQL syntax'.

val spark = SparkSession.builder.master("local[*]").appName("rddjoin"). getOrCreate()
 val mhost = "jdbc:mysql://localhost:3306/registry"
val mprop = new java.util.Properties
mprop.setProperty("driver", "com.mysql.jdbc.Driver")mprop.setProperty("user", "root")
mprop.setProperty("password", "root")
val q= """select id from loaded_item"""
val res=spark.read.jdbc(mhost,q,mprop)
res.show(10)

And the exception is as below:

18/02/16 17:53:49 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select id from loaded_item WHERE 1=0' at line 1
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
    at GenerateReport$.main(GenerateReport.scala:46)
    at GenerateReport.main(GenerateReport.scala)
18/02/16 17:53:50 INFO SparkContext: Invoking stop() from shutdown hook

回答1:

The second parameter of your call to spark.read.jdbc is not correct. Instead of specifing a sql query, you should either use a table name qualified with schema or a valid SQL query with an alias. In your case this would be val q="registry.loaded_item". Another option if you want to provide addional parameters (maybe for a where statement) is to use the other versions of DataframeReader.jdbc.

Btw: the reason why you see the strange looking query WHERE 1=0 is that Spark tries to infer the schema of your data frame without loading any actual data. This query is guaranteed never to deliver any results, but the query result's metadata can be used by Spark to get the schema information.

来源：https://stackoverflow.com/questions/48827041/why-does-spark-appends-where-1-0-at-the-end-of-sql-query

标签

apache-spark

apache-spark-sql

spark-dataframe