Convert a spark DataFrame to pandas DF

后端 未结 3 1481
-上瘾入骨i
-上瘾入骨i 2020-12-08 18:38

Is there a way to convert a Spark Df (not RDD) to pandas DF

I tried the following:

var some_df = Seq(
 (\"A\", \"no\"),
 (\"B\", \"yes\"),
 (\"B\", \         


        
相关标签:
3条回答
  • 2020-12-08 19:15

    Converting spark data frame to pandas can take time if you have large data frame. So you can use something like below:

    spark.conf.set("spark.sql.execution.arrow.enabled", "true")
    
    pd_df = df_spark.toPandas()
    

    I have tried this in DataBricks.

    0 讨论(0)
  • 2020-12-08 19:19

    In my case the following conversion from spark dataframe to pandas dataframe worked:

    pandas_df = spark_df.select("*").toPandas()
    
    0 讨论(0)
  • 2020-12-08 19:26

    following should work

    some_df = sc.parallelize([
     ("A", "no"),
     ("B", "yes"),
     ("B", "yes"),
     ("B", "no")]
     ).toDF(["user_id", "phone_number"])
    pandas_df = some_df.toPandas()
    
    0 讨论(0)
提交回复
热议问题