How to create a DataFrame out of rows while retaining existing schema?
If I call map or mapPartition and my function receives rows from PySpark what is the natural way to create either a local PySpark or Pandas DataFrame? Something that combines the rows and retains the schema? Currently I do something like: def combine(partition): rows = [x for x in partition] dfpart = pd.DataFrame(rows,columns=rows[0].keys()) pandafunc(dfpart) mydf.mapPartition(combine) zero323 Spark >= 2.3.0 Since Spark 2.3.0 it is possible to use Pandas Series or DataFrame by partition or group. See for example: Applying UDFs on GroupedData in PySpark (with functioning python example)