Explode array of structs to columns in Spark

匿名 (未验证) 提交于 2019-12-03 01:05:01

问题:

I'd like to explode an array of structs to columns (as defined by the struct fields). E.g.

root  |-- arr: array (nullable = true)  |    |-- element: struct (containsNull = true)  |    |    |-- id: long (nullable = false)  |    |    |-- name: string (nullable = true) 

Should be transformed to

root  |-- id: long (nullable = true)  |-- name: string (nullable = true) 

I can achieve this with

df   .select(explode($"arr").as("tmp"))   .select($"tmp.*") 

How can I do that in a single select statement?

I thought this could work, unfortunately it does not:

df.select(explode($"arr")(".*")) 

Exception in thread "main" org.apache.spark.sql.AnalysisException: No such struct field .* in col;

回答1:

Single step solution is available only for MapType columns:

val df = Seq(Tuple1(Map((1L, "bar"), (2L, "foo")))).toDF  df.select(explode($"_1") as Seq("foo", "bar")).show  +---+---+ |foo|bar| +---+---+ |  1|bar| |  2|foo| +---+---+ 

With arrays you can use flatMap:

val df = Seq(Tuple1(Array((1L, "bar"), (2L, "foo")))).toDF df.as[Seq[(Long, String)]].flatMap(identity) 

A single SELECT statement can written in SQL:

 df.createOrReplaceTempView("df")  spark.sql("SELECT x._1, x._2 FROM df LATERAL VIEW explode(_1) t AS x") 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!