Apache Spark — Assign the result of UDF to multiple dataframe columns

后端未结

关注

 2  1962

面向向阳花 2020-12-02 09:12

I\'m using pyspark, loading a large csv file into a dataframe with spark-csv, and as a pre-processing step I need to apply a variety of operations to the data available in o

2条回答

囚心锁ツ (楼主)

2020-12-02 09:56
you can use flatMap to get the column the desired dataframe in one go
```
df=df.withColumn('udf_results',udf)  
df4=df.select('udf_results').rdd.flatMap(lambda x:x).toDF(schema=your_new_schema)
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...