In spark iterate through each column and find the max length

前端 未结 3 1775
清歌不尽
清歌不尽 2021-01-15 20:28

I am new to spark scala and I have following situation as below I have a table \"TEST_TABLE\" on cluster(can be hive table) I am converting that to dataframe as:

<         


        
3条回答
  •  情深已故
    2021-01-15 21:29

    You can try in the following way:

    import org.apache.spark.sql.functions.{length, max}
    import spark.implicits._
    
    val df = Seq(("abc","abcd","abcdef"),
              ("a","BCBDFG","qddfde"),
              ("MN","1234B678","sd"),
              (null,"","sd")).toDF("COL1","COL2","COL3")
    df.cache()
    val output = df.columns.map(c => (c, df.agg(max(length(df(s"$c")))).as[Int].first())).toSeq.toDF("COLUMN_NAME", "MAX_LENGTH")
            +-----------+----------+
            |COLUMN_NAME|MAX_LENGTH|
            +-----------+----------+
            |       COL1|         3|
            |       COL2|         8|
            |       COL3|         6|
            +-----------+----------+
    

    I think it's good idea to cache input dataframe df to make the computation faster.

提交回复
热议问题