Python Pandas - How to flatten a hierarchical index in columns

匿名 (未验证) 提交于 2019-12-03 08:30:34

问题:

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf                                             sum   sum   sum    sum   amax   amin 0  702730  26451  1993      1    1     1     0    12     13  30.92  24.98 1  702730  26451  1993      1    2     0     0    13     13  32.00  24.98 2  702730  26451  1993      1    3     1    10     2     13  23.00   6.98 3  702730  26451  1993      1    4     1     0    12     13  10.04   3.92 4  702730  26451  1993      1    5     3     0    10     13  19.94  10.94 

I want to flatten it, so that it looks like this (names aren't critical - I could rename):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf_amax  tmpf_amin    0  702730  26451  1993      1    1     1     0    12     13  30.92          24.98 1  702730  26451  1993      1    2     0     0    13     13  32.00          24.98 2  702730  26451  1993      1    3     1    10     2     13  23.00          6.98 3  702730  26451  1993      1    4     1     0    12     13  10.04          3.92 4  702730  26451  1993      1    5     3     0    10     13  19.94          10.94 

How do I do this? (I've tried a lot, to no avail.)

Per a suggestion, here is the head in dict form

{('USAF', ''): {0: '702730',   1: '702730',   2: '702730',   3: '702730',   4: '702730'},  ('WBAN', ''): {0: '26451', 1: '26451', 2: '26451', 3: '26451', 4: '26451'},  ('day', ''): {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},  ('month', ''): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},  ('s_CD', 'sum'): {0: 12.0, 1: 13.0, 2: 2.0, 3: 12.0, 4: 10.0},  ('s_CL', 'sum'): {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},  ('s_CNT', 'sum'): {0: 13.0, 1: 13.0, 2: 13.0, 3: 13.0, 4: 13.0},  ('s_PC', 'sum'): {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 3.0},  ('tempf', 'amax'): {0: 30.920000000000002,   1: 32.0,   2: 23.0,   3: 10.039999999999999,   4: 19.939999999999998},  ('tempf', 'amin'): {0: 24.98,   1: 24.98,   2: 6.9799999999999969,   3: 3.9199999999999982,   4: 10.940000000000001},  ('year', ''): {0: 1993, 1: 1993, 2: 1993, 3: 1993, 4: 1993}} 

回答1:

I think the easiest way to do this would be to set the columns to the top level:

df.columns = df.columns.get_level_values(0) 

Note: if the to level has a name you can also access it by this, rather than 0.

.

If you want to combine/join your MultiIndex into one Index (assuming you have just string entries in your columns) you could:

df.columns = [' '.join(col).strip() for col in df.columns.values] 

Note: we must strip the whitespace for when there is no second index.

In [11]: [' '.join(col).strip() for col in df.columns.values] Out[11]:  ['USAF',  'WBAN',  'day',  'month',  's_CD sum',  's_CL sum',  's_CNT sum',  's_PC sum',  'tempf amax',  'tempf amin',  'year'] 


回答2:

pd.DataFrame(df.to_records()) # multiindex become columns and new index is integers only 


回答3:

Andy Hayden's answer is certainly the easiest way -- if you want to avoid duplicate column labels you need to tweak a bit



回答4:

df.columns = ['_'.join(tup).rstrip('_') for tup in df.columns.values] 


回答5:

And if you want to retain any of the aggregation info from the second level of the multiindex you can try this:

In [1]: new_cols = [''.join(t) for t in df.columns] Out[1]: ['USAF',  'WBAN',  'day',  'month',  's_CDsum',  's_CLsum',  's_CNTsum',  's_PCsum',  'tempfamax',  'tempfamin',  'year']  In [2]: df.columns = new_cols 


回答6:

In case you want to have a separator in the name between levels, this function works well.

def flattenHierarchicalCol(col,sep = '_'):     if not type(col) is tuple:         return col     else:         new_col = ''         for leveli,level in enumerate(col):             if not level == '':                 if not leveli == 0:                     new_col += sep                 new_col += level         return new_col  df.columns = df.columns.map(flattenHierarchicalCol) 


回答7:

A bit late maybe, but if you are not worried about duplicate column names:

df.columns = df.columns.tolist() 


回答8:

A general solution that handles multiple levels and mixed types:

df.columns = ['_'.join(tuple(map(str, t))) for t in df.columns.values] 


回答9:

Following @jxstanford and @tvt173, I wrote a quick function which should do the trick, regardless of string/int column names:

def flatten_cols(df):     df.columns = [         '_'.join(tuple(map(str, t))).rstrip('_')          for t in df.columns.values         ]     return df 


回答10:

You could also do as below. Consider df to be your dataframe and assume a two level index (as is the case in your example)

df.columns = [(df.columns[i][0])+'_'+(datadf_pos4.columns[i][1]) for i in range(len(df.columns))] 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!