How to modify a Spark Dataframe with a complex nested structure?

前端 未结 2 480
余生分开走
余生分开走 2020-12-30 09:58

I\'ve a complex DataFrame structure and would like to null a column easily. I\'ve created implicit classes that wire functionality and easily address 2D DataFrame structure

2条回答
  •  無奈伤痛
    2020-12-30 10:24

    Since Spark 1.6, you can use case classes to map your dataframes (called datasets). Then, you can map your data and transform it to the new schema you want. For example:

    case class Root(name: String, data: Seq[Data])
    case class Data(name: String, values: Map[String, String])
    case class NullableRoot(name: String, data: Seq[NullableData])
    case class NullableData(name: String, value: Map[String, String], values: Map[String, String])
    
    val nullableDF = df.as[Root].map { root =>
      val nullableData = root.data.map(data => NullableData(data.name, null, data.values))
      NullableRoot(root.name, nullableData)
    }.toDF()
    

    The resulting schema of nullableDF will be:

    root
     |-- name: string (nullable = true)
     |-- data: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- name: string (nullable = true)
     |    |    |-- value: map (nullable = true)
     |    |    |    |-- key: string
     |    |    |    |-- value: string (valueContainsNull = true)
     |    |    |-- values: map (nullable = true)
     |    |    |    |-- key: string
     |    |    |    |-- value: string (valueContainsNull = true)
    

提交回复
热议问题