PySpark replace null in column with value in other column

前端 未结 3 929
我寻月下人不归
我寻月下人不归 2020-12-10 01:51

I want to replace null values in one column with the values in an adjacent column ,for example if i have

A|B
0,1
2,null
3,null
4,2

I want i

相关标签:
3条回答
  • 2020-12-10 02:23

    At the end found an alternative:

    df.withColumn("B",coalesce(df.B,df.A)) 
    
    0 讨论(0)
  • 2020-12-10 02:36

    Another Answer.

    If the below df1 your dataframe

    rd1 = sc.parallelize([(0,1), (2,None), (3,None), (4,2)])
    df1 = rd1.toDF(['A', 'B'])
    
    from pyspark.sql.functions import when
    df1.select('A',
               when( df1.B.isNull(), df1.A).otherwise(df1.B).alias('B')
              )\
       .show()
    
    0 讨论(0)
  • 2020-12-10 02:40
    df.rdd.map(lambda row: row if row[1] else Row(a=row[0],b=row[0])).toDF().show()
    
    0 讨论(0)
提交回复
热议问题