In spark iterate through each column and find the max length

前端未结

关注

 3  1775

清歌不尽 2021-01-15 20:28

I am new to spark scala and I have following situation as below I have a table \"TEST_TABLE\" on cluster(can be hive table) I am converting that to dataframe as:

3条回答

情深已故 (楼主)

2021-01-15 21:29

You can try in the following way:

import org.apache.spark.sql.functions.{length, max}
import spark.implicits._

val df = Seq(("abc","abcd","abcdef"),
          ("a","BCBDFG","qddfde"),
          ("MN","1234B678","sd"),
          (null,"","sd")).toDF("COL1","COL2","COL3")
df.cache()
val output = df.columns.map(c => (c, df.agg(max(length(df(s"$c")))).as[Int].first())).toSeq.toDF("COLUMN_NAME", "MAX_LENGTH")
        +-----------+----------+
        |COLUMN_NAME|MAX_LENGTH|
        +-----------+----------+
        |       COL1|         3|
        |       COL2|         8|
        |       COL3|         6|
        +-----------+----------+

I think it's good idea to cache input dataframe df to make the computation faster.

0 讨论(0)

查看其它3个回答