How to convert a simple DataFrame to a DataSet Spark Scala with case class?

后端 未结 2 1552
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-17 03:48

I am trying to convert a simple DataFrame to a DataSet from the example in Spark: https://spark.apache.org/docs/latest/sql-programming-guide.html

case class         


        
相关标签:
2条回答
  • 2021-01-17 04:03

    This is how you create dataset from case class

    case class Person(name: String, age: Long) 
    

    Keep the case class outside of the class that has below code

    val primitiveDS = Seq(1,2,3).toDS()
    val augmentedDS = primitiveDS.map(i => Person("var_" + i.toString, (i + 1).toLong))
    augmentedDS.show()
    
    augmentedDS.as[Person].show()
    

    Hope this helped

    0 讨论(0)
  • 2021-01-17 04:18

    If you change Int to Long (or BigInt) it works fine:

    case class Person(name: String, age: Long)
    import spark.implicits._
    
    val path = "examples/src/main/resources/people.json"
    
    val peopleDS = spark.read.json(path).as[Person]
    peopleDS.show()
    

    Output:

    +----+-------+
    | age|   name|
    +----+-------+
    |null|Michael|
    |  30|   Andy|
    |  19| Justin|
    +----+-------+
    

    EDIT: Spark.read.json by default parses numbers as Long types - it's safer to do so. You can change the col type after using casting or udfs.

    EDIT2:

    To answer your 2nd question, you need to name the columns correctly before the conversion to Person will work:

    val primitiveDS = Seq(1,2,3).toDS()
    val augmentedDS = primitiveDS.map(i => ("var_" + i.toString, (i + 1).toLong)).
     withColumnRenamed ("_1", "name" ).
     withColumnRenamed ("_2", "age" )
    augmentedDS.as[Person].show()
    

    Outputs:

    +-----+---+
    | name|age|
    +-----+---+
    |var_1|  2|
    |var_2|  3|
    |var_3|  4|
    +-----+---+
    
    0 讨论(0)
提交回复
热议问题