How to drop row index and flatten index in this way

时光毁灭记忆、已成空白 提交于 2020-11-25 03:59:47

问题


I have the following dfe :-

id       categ  level  cols           value   comment
1         A      PG    Apple           428    comment1 
1         A      CD    Apple           175    comment1 
1         C      PG    Apple           226    comment1 
1         C      AB    Apple           884    comment1 
1         C      CD    Apple           288    comment1 
1         B      PG    Apple           712    comment1 
1         B      AB    Apple           849    comment1 
2         B      CD    Apple           376    comment1 
2         C      None  Orange          591    comment1 
2         B      CD    Orange          135    comment1 
2         D      None  Orange          423    comment1 
2         A      AB    Orange          1e13   comment1 
2         D      PG    Orange          1e15   comment2 





   df2 = pd.DataFrame({'s2': {0: 1, 1: 2, 2: 3}, `level': {0: 'PG', 1: 'AB', 2: 'CD'}})
    df1 = pd.DataFrame({'sl': {0: 1, 1: 2, 2: 3, 3: 4}, 'set': {0: 'A', 1: 'C', 2: 'B', 3: 'D'}})
    dfe = (dfe[['categ','level','cols','id','comment','value']]
            .merge(df1.rename({'set' : 'categ'}, axis=1),how='left',on='categ')
            .merge(df2, how='left', on='level'))
    na = dfe['level'].isna()
    
    dfs = {'no_null': dfe[~na], 'null': dfe[na]}
    
    with pd.ExcelWriter('XYZ.xlsx') as writer: 
        
        for p,r in dfs.items():
            if p== 'no_null':
    
                c= ['cols','s2','level']
            else:
    
                 c = 'cols'
            
            df = r.pivot_table(index=['id','sl','comment','categ'], columns=c, values=['value'])
            df.columns = df.columns.droplevel([0,2])
            df  = df.reset_index().drop(('sl',''), axis=1).set_index('categ')
            
            
            for (id,comment), sdf in df.groupby(['id','comment']):
                df = sdf.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
                df.to_excel(writer,sheet_name=name)

Running this I get results displayed in excel this way :-

I want to order in certain way, what I tried :-

df = r.pivot_table(index=['id','sl','comment','categ'], columns=c, values='value')
            df.columns = df.columns.droplevel([1])
            df  = df.reset_index().drop(('sl',''), axis=1).set_index('categ')

This gives me Too many levels: Index has only 2 levels, not 3 error, I don't know what Im missing /wrong here .

My expected output for arrangement of headings is :-

Would like to know if headings can be written to excel in CAPS as shown in expected output.

EDIT 1 I tried the answer and Im getting this view :-

I want to be able to display ID & COMMENT only once (as its already grouped by ID in code logic), and drop the sl column and the first column 0,1,2 and also delete the blank row above 0


回答1:


Given dfe as:

   categ level    cols  id   comment         value  sl   s2
0      A    PG   Apple   1  comment1  4.280000e+02   1  1.0
1      A    CD   Apple   1  comment1  1.750000e+02   1  3.0
2      C    PG   Apple   1  comment1  2.260000e+02   2  1.0
3      C    AB   Apple   1  comment1  8.840000e+02   2  2.0
4      C    CD   Apple   1  comment1  2.880000e+02   2  3.0
5      B    PG   Apple   1  comment1  7.120000e+02   3  1.0
6      B    AB   Apple   1  comment1  8.490000e+02   3  2.0
7      B    CD   Apple   2  comment1  3.760000e+02   3  3.0
8      C  None  Orange   2  comment1  5.910000e+02   2  NaN
9      B    CD  Orange   2  comment1  1.350000e+02   3  3.0
10     D  None  Orange   2  comment1  4.230000e+02   4  NaN
11     A    AB  Orange   2  comment1  1.000000e+13   1  2.0
12     D    PG  Orange   2  comment2  1.000000e+15   4  1.0

Then try:

df = dfe.pivot_table(index=['id','comment','categ'], columns=c, values='value')
df.columns = df.columns.droplevel([1])

df = (df.rename_axis(columns=[None, None])
        .reset_index(col_level=1)
        .rename(columns = lambda x: x.upper()))
df.to_excel('testa1.xlsx')

Output:

Notes:

  • Removed [] around 'value' in pivot_table to not include 'value' as a column index.
  • Aligned 'categ', 'label' and 'comments' with column index level 1 using col_level parameter.
  • See this post about the blank line, https://stackoverflow.com/a/52498899/6361531.



回答2:


I think it would be easier to drop columns name and the replace it with a custome one:

df.columns = df.columns.droplevel()
df.columns = pd.MultiIndex.from_tuples([("", "ID"), ("", "CATEG"), ("apple", "PG"), ("apple", "AB"), ("apple", "CD"), ("orange", "PG"), ("orange", "AB"), ("orange", "CD")])


来源:https://stackoverflow.com/questions/64395699/how-to-drop-row-index-and-flatten-index-in-this-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!