Comparing columns in Pyspark

前端 未结 5 851
Happy的楠姐
Happy的楠姐 2020-12-01 18:17

I am working on a PySpark DataFrame with n columns. I have a set of m columns (m < n) and my task is choose the column with max values in it.

For example:

5条回答
  •  攒了一身酷
    2020-12-01 19:09

    Scala solution:

    df = sc.parallelize(Seq((10, 10, 1 ), (200, 2, 20), (3, 30, 300), (400, 40, 4))).toDF("c1", "c2", "c3"))  
    
    df.rdd.map(row=>List[String](row(0).toString,row(1).toString,row(2).toString)).map(x=>(x(0),x(1),x(2),x.min)).toDF("c1","c2","c3","min").show    
    

    +---+---+---+---+  
    | c1| c2| c3|min|  
    +---+---+---+---+  
    | 10| 10|  1|  1|    
    |200|  2| 20|  2|  
    |  3| 30|300|  3|  
    |400| 40|  4|  4|  
    +---+---+---+---+  
    

提交回复
热议问题