I\'m using pyspark, loading a large csv file into a dataframe with spark-csv, and as a pre-processing step I need to apply a variety of operations to the data available in o
you can use flatMap to get the column the desired dataframe in one go
df=df.withColumn('udf_results',udf) df4=df.select('udf_results').rdd.flatMap(lambda x:x).toDF(schema=your_new_schema)