get specific row from spark dataframe

前端 未结 9 1701
闹比i
闹比i 2020-12-17 07:58

Is there any alternative for df[100, c(\"column\")] in scala spark data frames. I want to select specific row from a column of spark data frame. for example

9条回答
  •  攒了一身酷
    2020-12-17 08:38

    Firstly, you must understand that DataFrames are distributed, that means you can't access them in a typical procedural way, you must do an analysis first. Although, you are asking about Scala I suggest you to read the Pyspark Documentation, because it has more examples than any of the other documentations.

    However, continuing with my explanation, I would use some methods of the RDD API cause all DataFrames have one RDD as attribute. Please, see my example bellow, and notice how I take the 2nd record.

    df = sqlContext.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
    myIndex = 1
    values = (df.rdd.zipWithIndex()
                .filter(lambda ((l, v), i): i == myIndex)
                .map(lambda ((l,v), i): (l, v))
                .collect())
    
    print(values[0])
    # (u'b', 2)
    

    Hopefully, someone gives another solution with fewer steps.

提交回复
热议问题