Add column to pyspark dataframe based on a condition [duplicate]

老子叫甜甜 提交于 2020-06-28 01:59:05

问题


My data.csv file has three columns like given below. I have converted this file to python spark dataframe.

  A   B    C
| 1 | -3 | 4 |
| 2 | 0  | 5 |
| 6 | 6  | 6 |

I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.

  A   B    C   D
| 1 | -3 | 4 | No  |
| 2 | 0  | 5 | No  |
| 6 | 6  | 6 | Yes |

I am not able to implement this through PySpark dataframe operations.


回答1:


Try something like this:

from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()


来源:https://stackoverflow.com/questions/54839033/add-column-to-pyspark-dataframe-based-on-a-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!