Spark/scala create empty dataset using generics in a trait

余生颓废 提交于 2019-12-04 04:59:25

问题


I have a trait called that takes a type parameter, and one of its methods needs to be able to create an empty typed dataset.

trait MyTrait[T] {
    val sparkSession: SparkSession
    val spark = sparkSession.session
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        import spark.implicits._ // to access .toDS() function
        // DOESN'T WORK.
        val emptyRDD = sparkContext.parallelize(Seq[T]())
        val accumulator = emptyRDD.toDS()
        ...
    }
}

So far I have not gotten it to work. It complains no ClassTag for T, and that value toDS is not a member of org.apache.spark.rdd.RDD[T]

Any help would be appreciated. Thanks!


回答1:


You have to provide both ClassTag[T] and Encoder[T] in the same scope. For example:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
import scala.reflect.ClassTag


trait MyTrait[T] {
    val ct: ClassTag[T]
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        val emptyRDD = sparkContext.emptyRDD[T](ct)
        spark.createDataset(emptyRDD)(enc)
    }
}

with concrete implementation:

class Foo extends MyTrait[Int] {
   val sparkSession = SparkSession.builder.getOrCreate()
   import sparkSession.implicits._

   val ct = implicitly[ClassTag[Int]]
   val enc = implicitly[Encoder[Int]]
}

It is possible to skip RDD:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}

trait MyTrait[T] {
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        spark.emptyDataset[T](enc)
    }
}

Check How to declare traits as taking implicit "constructor parameters"?, specifically answer by Blaisorblade and another one by Alexey Romanov.



来源:https://stackoverflow.com/questions/47644051/spark-scala-create-empty-dataset-using-generics-in-a-trait

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!