PySpark - iterate rows of a Data Frame

十年热恋 提交于 2021-01-28 06:05:50

问题


I need to iterate rows of a pyspark.sql.dataframe.DataFrame.DataFrame.

I have done it in pandas in the past with the function iterrows() but I need to find something similar for pyspark without using pandas.

If I do for row in myDF: it iterates columns.DataFrame

Thanks


回答1:


You can use select method to operate on your dataframe using a user defined function something like this :

    columns = header.columns
    my_udf = F.udf(lambda data: "do what ever you want here " , StringType())
    myDF.select(*[my_udf(col(c)) for c in columns])

then inside the select you can choose what you want to do with each column .



来源:https://stackoverflow.com/questions/51152310/pyspark-iterate-rows-of-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!