Apache Spark — Assign the result of UDF to multiple dataframe columns

后端 未结 2 1962
面向向阳花
面向向阳花 2020-12-02 09:12

I\'m using pyspark, loading a large csv file into a dataframe with spark-csv, and as a pre-processing step I need to apply a variety of operations to the data available in o

2条回答
  •  囚心锁ツ
    2020-12-02 09:56

    you can use flatMap to get the column the desired dataframe in one go

    df=df.withColumn('udf_results',udf)  
    df4=df.select('udf_results').rdd.flatMap(lambda x:x).toDF(schema=your_new_schema)
    

提交回复
热议问题