Extracting `Seq[(String,String,String)]` from spark DataFrame

前端 未结 2 2014
野性不改
野性不改 2020-12-25 07:48

I have a spark DF with rows of Seq[(String, String, String)]. I\'m trying to do some kind of a flatMap with that but anything I do try ends up thro

2条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-25 08:44

    object ListSerdeTest extends App {
    
      implicit val spark: SparkSession = SparkSession
        .builder
        .master("local[2]")
        .getOrCreate()
    
    
      import spark.implicits._
      val myDS = spark.createDataset(
        Seq(
          MyCaseClass(mylist = Array(("asd", "aa"), ("dd", "ee")))
        )
      )
    
      myDS.toDF().printSchema()
    
      myDS.toDF().foreach(
        row => {
          row.getSeq[Row](row.fieldIndex("mylist"))
            .foreach {
              case Row(a, b) => println(a, b)
            }
        }
      )
    }
    
    case class MyCaseClass (
                     mylist: Seq[(String, String)]
                   )
    

    Above code is yet another way to deal with nested structure. Spark default Encoder will encode TupleX, making them nested struct, that's why you are seeing this strange behaviour. and like others said in the comment, you can't just do getAs[T]() since it's just a cast(x.asInstanceOf[T]), therefore will give you runtime exceptions.

提交回复
热议问题