Add column to pyspark dataframe based on a condition [duplicate]

问题

My data.csv file has three columns like given below. I have converted this file to python spark dataframe.

  A   B    C
| 1 | -3 | 4 |
| 2 | 0  | 5 |
| 6 | 6  | 6 |

I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.

  A   B    C   D
| 1 | -3 | 4 | No  |
| 2 | 0  | 5 | No  |
| 6 | 6  | 6 | Yes |

I am not able to implement this through PySpark dataframe operations.

回答1:

Try something like this:

from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()

来源：https://stackoverflow.com/questions/54839033/add-column-to-pyspark-dataframe-based-on-a-condition

标签

python

apache-spark

dataframe

pyspark

apache-spark-sql

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!