Spark dataframe transform multiple rows to column

前端 未结 2 845
孤街浪徒
孤街浪徒 2020-12-28 16:32

I am a novice to spark, and I want to transform below source dataframe (load from JSON file):

+--+-----+-----+
|A |count|ma         


        
2条回答
  •  南方客
    南方客 (楼主)
    2020-12-28 16:58

    Using zero323's dataframe,

    df = sqlContext.createDataFrame([
    ("a", 1, "m1"), ("a", 1, "m2"), ("a", 2, "m3"),
    ("a", 3, "m4"), ("b", 4, "m1"), ("b", 1, "m2"),
    ("b", 2, "m3"), ("c", 3, "m1"), ("c", 4, "m3"),
    ("c", 5, "m4"), ("d", 6, "m1"), ("d", 1, "m2"),
    ("d", 2, "m3"), ("d", 3, "m4"), ("d", 4, "m5"),
    ("e", 4, "m1"), ("e", 5, "m2"), ("e", 1, "m3"),
    ("e", 1, "m4"), ("e", 1, "m5")], 
    ("a", "cnt", "major"))
    

    you could also use

    reshaped_df = df.groupby('a').pivot('major').max('cnt').fillna(0)
    

提交回复
热议问题