How to validate Spark SQL expression without executing it?

后端 未结 2 741
故里飘歌
故里飘歌 2021-01-02 10:12

I want to validate if spark-sql query is syntactically correct or not without actually running the query on the cluster.

Actual use case is that I am trying to devel

相关标签:
2条回答
  • 2021-01-02 10:54

    SparkSqlParser

    Spark SQL uses SparkSqlParser as the parser for Spark SQL expressions.

    You can access SparkSqlParser using SparkSession (and SessionState) as follows:

    val spark: SparkSession = ...
    val parser = spark.sessionState.sqlParser
    
    scala> parser.parseExpression("select * from table")
    res1: org.apache.spark.sql.catalyst.expressions.Expression = ('select * 'from) AS table#0
    

    TIP: Enable INFO logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

    SparkSession.sql Method

    That alone won't give you the most bullet-proof shield against incorrect SQL expressions and think sql method is a better fit.

    sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

    See both in action below.

    scala> parser.parseExpression("hello world")
    res5: org.apache.spark.sql.catalyst.expressions.Expression = 'hello AS world#2
    
    scala> spark.sql("hello world")
    org.apache.spark.sql.catalyst.parser.ParseException:
    mismatched input 'hello' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)
    
    == SQL ==
    hello world
    ^^^
    
      at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
      at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
      at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
      at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
      at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
      ... 49 elided
    
    0 讨论(0)
  • 2021-01-02 10:55

    Following @JacekLaskowski 's answer, I found that SparkSqlParser gave me all sorts of errors that were not really syntax errors.

    I therefore agree with him and suggest simply throwing it into SparkSession.sql, which works fine. This is what my method looks like:

      /**
       * Validates a Spark SQL statement by trying to execute it and checking
       * if there are no syntax-related exceptions.
       */
      def validate(sqlStatement: String): Unit = {
        val spark = SparkSession.builder
          .master("local")
          .getOrCreate()
        try {
          spark.sql(sqlStatement)
        } catch {
          case ex: ParseException => throw new MyCustomException("Invalid Spark SQL", ex)
          case _: AnalysisException => // Syntax was correct
        }
      }
    
    0 讨论(0)
提交回复
热议问题