How to pivot Spark DataFrame?

后端 未结 10 2402
闹比i
闹比i 2020-11-21 06:43

I am starting to use Spark DataFrames and I need to be able to pivot the data to create multiple columns out of 1 column with multiple rows. There is built in functionality

10条回答
  •  遥遥无期
    2020-11-21 06:57

    I overcame this by writing a for loop to dynamically create a SQL query. Say I have:

    id  tag  value
    1   US    50
    1   UK    100
    1   Can   125
    2   US    75
    2   UK    150
    2   Can   175
    

    and I want:

    id  US  UK   Can
    1   50  100  125
    2   75  150  175
    

    I can create a list with the value I want to pivot and then create a string containing the SQL query I need.

    val countries = List("US", "UK", "Can")
    val numCountries = countries.length - 1
    
    var query = "select *, "
    for (i <- 0 to numCountries-1) {
      query += """case when tag = """" + countries(i) + """" then value else 0 end as """ + countries(i) + ", "
    }
    query += """case when tag = """" + countries.last + """" then value else 0 end as """ + countries.last + " from myTable"
    
    myDataFrame.registerTempTable("myTable")
    val myDF1 = sqlContext.sql(query)
    

    I can create similar query to then do the aggregation. Not a very elegant solution but it works and is flexible for any list of values, which can also be passed in as an argument when your code is called.

提交回复
热议问题