get specific row from spark dataframe

前端 未结 9 1722
闹比i
闹比i 2020-12-17 07:58

Is there any alternative for df[100, c(\"column\")] in scala spark data frames. I want to select specific row from a column of spark data frame. for example

9条回答
  •  太阳男子
    2020-12-17 08:13

    The getrows() function below should get the specific rows you want.

    For completeness, I have written down the full code in order to reproduce the output.

    # Create SparkSession
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.master('local').appName('scratch').getOrCreate()
    
    # Create the dataframe
    df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
    
    # Function to get rows at `rownums`
    def getrows(df, rownums=None):
        return df.rdd.zipWithIndex().filter(lambda x: x[1] in rownums).map(lambda x: x[0])
    
    # Get rows at positions 0 and 2.
    getrows(df, rownums=[0, 2]).collect()
    
    # Output:
    #> [(Row(letter='a', name=1)), (Row(letter='c', name=3))]
    

提交回复
热议问题