Pandas: Multilevel column names

前端 未结 5 1118
梦毁少年i
梦毁少年i 2020-12-08 02:28

pandas has support for multi-level column names:

>>>  x = pd.DataFrame({\'instance\':[\'first\',\'first\',\'first\'],\'foo\':[\'a\',\'b         


        
相关标签:
5条回答
  • 2020-12-08 03:00

    A lot of these solutions seem just a bit more complex than they need to be.

    I prefer to make things look as simple and intuitive as possible when speed isn't absolutely necessary. I think this solution accomplishes that. Tested in versions of pandas as early as 0.22.0.

    Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.

    In [1]: import pandas as pd                                                                                                                                                                                          
    
    In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              
    
    In [3]: df                                                                                                                                                                                                           
    Out[3]: 
       0  1  2  3
    0  1  1  1  1
    1  2  2  2  2
    
    In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    
    
    In [5]: df                                                                                                                                                                                                           
    Out[5]: 
       a  c  e  g
       b  d  f  h
    0  1  1  1  1
    1  2  2  2  2
    
    0 讨论(0)
  • 2020-12-08 03:05

    Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.

    def create_tuple_for_for_columns(df_a, multi_level_col):
        """
        Create a columns tuple that can be pandas MultiIndex to create multi level column
    
        :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
        :param multi_level_col: name of second level column
        :return: tuple containing (second_level_col, firs_level_cols)
        """
        temp_columns = []
        for item in df_a.columns:
            temp_columns.append((multi_level_col, item))
        return temp_columns
    

    It can be used like this:

    df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
    columns = create_tuple_for_for_columns(df, 'c')
    df.columns = pd.MultiIndex.from_tuples(columns)
    
    0 讨论(0)
  • Try this:

    df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
    
    columns=[('c','a'),('c','b')]
    
    df.columns=pd.MultiIndex.from_tuples(columns)
    
    0 讨论(0)
  • 2020-12-08 03:14

    No need to create a list of tuples

    Use: pd.MultiIndex.from_product(iterables)

    import pandas as pd
    import numpy as np
    
    df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
    df.columns = pd.Multiindex.from_product([["new_label"], df.columns])
    

    Resultant DataFrame:

      new_label                    
              a         b         c
    0   0.25999  0.337535  0.333568
    

    Pull request from Jan 25, 2014

    0 讨论(0)
  • 2020-12-08 03:24

    You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

    In [46]: d = {}
    
    In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                             data=[[10, 0.89, 0.98, 0.31],
                                                   [20, 0.34, 0.78, 0.34]]).set_index('idx')
    
    In [48]: pd.concat(d, axis=1)
    Out[48]:
        first_level
                  a     b     c
    idx
    10         0.89  0.98  0.31
    20         0.34  0.78  0.34
    

    You can use the same technique to create multiple levels.

    In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                              data=[[10, 0.29, 0.63, 0.99],
                                                    [20, 0.23, 0.26, 0.98]]).set_index('idx')
    
    In [50]: pd.concat(d, axis=1)
    Out[50]:
        first_level             second_level
                  a     b     c            a     b     c
    idx
    10         0.89  0.98  0.31         0.29  0.63  0.99
    20         0.34  0.78  0.34         0.23  0.26  0.98
    
    0 讨论(0)
提交回复
热议问题