Pandas - unstack column values into new columns

前端 未结 2 1866
小蘑菇
小蘑菇 2021-01-04 15:27

I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:

import pandas          


        
2条回答
  •  渐次进展
    2021-01-04 16:13

    You can use pivot_table with reset_index and rename_axis (new in pandas 0.18.0):

    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc='first')
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2  n1  n2
    0     a     g  y1  y2
    1     b     h  y3  y4
    

    But better is use aggfunc join:

    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc=', '.join)
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2  n1  n2
    0     a     g  y1  y2
    1     b     h  y3  y4
    

    Explanation, why join is generally better as first:

    If use first, you can lost all data which are not first in each group by index, but join concanecate them:

    import pandas as pd
    
    df = pd.DataFrame([["a","g","n1","y1"], 
                       ["a","g","n2","y2"], 
                       ["a","g","n1","y3"], 
                       ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
    
    print (df)
      meta1 meta2 name data
    0     a     g   n1   y1
    1     a     g   n2   y2
    2     a     g   n1   y3
    3     b     h   n2   y4
    
    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc='first')
             .reset_index()
             .rename_axis(None, axis=1))
      meta1 meta2    n1  n2
    0     a     g    y1  y2
    1     b     h  None  y4
    
    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc=', '.join)
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2      n1  n2
    0     a     g  y1, y3  y2
    1     b     h    None  y4 
    

提交回复
热议问题