If I call map or mapPartition and my function receives rows from PySpark what is the natural way to create either a local PySpark or Pandas DataFrame? Something
mapPartition
You could use toPandas(),
toPandas()
pandasdf = mydf.toPandas()