Extract column values of Dataframe as List in Apache Spark

后端 未结 10 1047
慢半拍i
慢半拍i 2020-12-22 16:52

I want to convert a string column of a data frame to a list. What I can find from the Dataframe API is RDD, so I tried converting it back to RDD first, and then

相关标签:
10条回答
  • 2020-12-22 17:18

    This is java answer.

    df.select("id").collectAsList();
    
    0 讨论(0)
  • 2020-12-22 17:19

    This should return the collection containing single list:

    dataFrame.select("YOUR_COLUMN_NAME").rdd.map(r => r(0)).collect()
    

    Without the mapping, you just get a Row object, which contains every column from the database.

    Keep in mind that this will probably get you a list of Any type. Ïf you want to specify the result type, you can use .asInstanceOf[YOUR_TYPE] in r => r(0).asInstanceOf[YOUR_TYPE] mapping

    P.S. due to automatic conversion you can skip the .rdd part.

    0 讨论(0)
  • 2020-12-22 17:19

    Below is for Python-

    df.select("col_name").rdd.flatMap(lambda x: x).collect()
    
    0 讨论(0)
  • 2020-12-22 17:24

    In Scala and Spark 2+, try this (assuming your column name is "s"): df.select('s).as[String].collect

    0 讨论(0)
  • 2020-12-22 17:26
    sqlContext.sql(" select filename from tempTable").rdd.map(r => r(0)).collect.toList.foreach(out_streamfn.println) //remove brackets
    

    it works perfectly

    0 讨论(0)
  • 2020-12-22 17:27

    An updated solution that gets you a list:

    dataFrame.select("YOUR_COLUMN_NAME").map(r => r.getString(0)).collect.toList
    
    0 讨论(0)
提交回复
热议问题