PySpark: modify column values when another column value satisfies a condition

前端 未结 2 1069
小蘑菇
小蘑菇 2020-12-10 02:49

I have a PySpark Dataframe that has two columns Id and rank,

+---+----+
| Id|Rank|
+---+----+
|  a|   5|
|  b|   7|
|  c|   8|
|  d|   1|
+---+----+
<         


        
相关标签:
2条回答
  • 2020-12-10 03:10

    You can use when and otherwise like -

    from pyspark.sql.functions import *
    
    df\
    .withColumn('Id_New',when(df.Rank <= 5,df.Id).otherwise('other'))\
    .drop(df.Id)\
    .select(col('Id_New').alias('Id'),col('Rank'))\
    .show()
    

    this gives output as -

    +-----+----+
    |   Id|Rank|
    +-----+----+
    |    a|   5|
    |other|   7|
    |other|   8|
    |    d|   1|
    +-----+----+
    
    0 讨论(0)
  • 2020-12-10 03:21

    Starting with @Pushkr solution couldn't you just use the following ?

    from pyspark.sql.functions import *
    
    df.withColumn('Id',when(df.Rank <= 5,df.Id).otherwise('other')).show()
    
    0 讨论(0)
提交回复
热议问题