import implicit conversions without instance of SparkSession

混江龙づ霸主 提交于 2019-12-30 06:59:13

问题


My Spark-Code is cluttered with code like this

object Transformations {   
  def selectI(df:DataFrame) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import df.sparkSession.implicits._

    df.select($"i")
  }
}

or alternatively

object Transformations {   
  def selectI(df:DataFrame)(implicit spark:SparkSession) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import sparkSession.implicits._

    df.select($"i")
  }
}

I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like :

object Transformations {  
  import org.apache.spark.sql.SQLImplicits._ // does not work

  def selectI(df:DataFrame) : DataFrame = {    
    df.select($"i")
  }
}

Is there an elegant solution for this problem? My use of the implicits is not limited to $ but also Encoders, .toDF() etc.


回答1:


I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like

Because every Dataset exists in a scope of specific SparkSession and a single Spark application can have multiple active SparkSession.

Theoretically some of the SparkSession.implicits._ could exist separately from the session instance like:

import org.apache.spark.sql.implicits._   // For let's say `$` or `Encoders`
import org.apache.spark.sql.SparkSession.builder.getOrCreate.implicits._  // For toDF

but it would have a significant impact on the user code.



来源:https://stackoverflow.com/questions/50984326/import-implicit-conversions-without-instance-of-sparksession

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!