get specific row from spark dataframe

前端 未结 9 1700
闹比i
闹比i 2020-12-17 07:58

Is there any alternative for df[100, c(\"column\")] in scala spark data frames. I want to select specific row from a column of spark data frame. for example

相关标签:
9条回答
  • 2020-12-17 08:34

    This Works for me in PySpark

    df.select("column").collect()[0][0]
    
    0 讨论(0)
  • 2020-12-17 08:38

    Firstly, you must understand that DataFrames are distributed, that means you can't access them in a typical procedural way, you must do an analysis first. Although, you are asking about Scala I suggest you to read the Pyspark Documentation, because it has more examples than any of the other documentations.

    However, continuing with my explanation, I would use some methods of the RDD API cause all DataFrames have one RDD as attribute. Please, see my example bellow, and notice how I take the 2nd record.

    df = sqlContext.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
    myIndex = 1
    values = (df.rdd.zipWithIndex()
                .filter(lambda ((l, v), i): i == myIndex)
                .map(lambda ((l,v), i): (l, v))
                .collect())
    
    print(values[0])
    # (u'b', 2)
    

    Hopefully, someone gives another solution with fewer steps.

    0 讨论(0)
  • 2020-12-17 08:40

    you can simply do that by using below single line of code

    val arr = df.select("column").collect()(99)
    
    0 讨论(0)
提交回复
热议问题