registerTempTable fails on DataFrame created from RDD

后端 未结 1 431
无人及你
无人及你 2021-01-16 00:52

This is in Spark 1.6.x. I\'m looking for a workaround.

I have a function that creates a DataFrame from a DataFrame\'s underlying RDD:

相关标签:
1条回答
  • 2021-01-16 01:50

    This happens because you create a new SQLContext in your function. Since temporary tables are limited in scope to its parent context, there cannot be accessed from another one.

    df2.sqlContext.sql("SELECT * FROM df2")
    

    To solve this, pass existing SQLContext in place of SparkContext:

    def rddAndBack(sqlContext: org.apache.spark.sql.SQLContext, df: DataFrame) = {
      sqlContext.createDataFrame(df.rdd, df.schema)
    }
    

    or use getOrCreate factory method:

    def rddAndBack(sc: SparkContext, df: DataFrame) : DataFrame = {
      val sqlContext = org.apache.spark.sql.SQLContext.getOrCreate(sc)
      sqlContext.createDataFrame(df.rdd, df.schema)
    }
    

    or use SQLContext instance bound to the input df:

    def rddAndBack(sc: SparkContext, df: DataFrame) : DataFrame = {
      val sqlContext = df.sqlContext
      sqlContext.createDataFrame(df.rdd, df.schema)
    }
    
    0 讨论(0)
提交回复
热议问题