How to pivot on multiple columns in Spark SQL?

后端 未结 3 1288
梦毁少年i
梦毁少年i 2020-12-13 20:01

I need to pivot more than one column in a pyspark dataframe. Sample dataframe,

 >>> d = [(100,1,23,10),(100,2,45,11),(100,3,67,12),(100,4,78,13),(10         


        
相关标签:
3条回答
  • 2020-12-13 20:27

    As in spark 1.6 version I think that's the only way because pivot takes only one column and there is second attribute values on which you can pass the distinct values of that column that will make your code run faster because otherwise spark has to run that for you, so yes that's the right way to do it.

    0 讨论(0)
  • 2020-12-13 20:29

    The solution in the question is the best I could get. The only improvement would be to cache the input dataset to avoid double scan, i.e.

    mydf.cache
    pivot_udf(mydf,'price','units').show()
    
    0 讨论(0)
  • 2020-12-13 20:39

    Here's a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates).

    dff = mydf.groupBy('id').pivot('day').agg(F.first('price').alias('price'),F.first('units').alias('unit'))
    

    Here's the result (apologies for the non-matching ordering and naming):

    +---+-------+------+-------+------+-------+------+-------+------+               
    | id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit|
    +---+-------+------+-------+------+-------+------+-------+------+
    |100|     23|    10|     45|    11|     67|    12|     78|    13|
    |101|     23|    10|     45|    13|     67|    14|     78|    15|
    |102|     23|    10|     45|    11|     67|    16|     78|    18|
    +---+-------+------+-------+------+-------+------+-------+------+
    

    We just aggregate both on the price and the unit column after pivoting on the day.

    If naming required as in question,

    dff.select([F.col(c).name('_'.join(x for x in c.split('_')[::-1])) for c in dff.columns]).show()
    
    +---+-------+------+-------+------+-------+------+-------+------+
    | id|price_1|unit_1|price_2|unit_2|price_3|unit_3|price_4|unit_4|
    +---+-------+------+-------+------+-------+------+-------+------+
    |100|     23|    10|     45|    11|     67|    12|     78|    13|
    |101|     23|    10|     45|    13|     67|    14|     78|    15|
    |102|     23|    10|     45|    11|     67|    16|     78|    18|
    +---+-------+------+-------+------+-------+------+-------+------+
    
    0 讨论(0)
提交回复
热议问题