Pandas expand rows from list data available in column

前端 未结 3 1966
陌清茗
陌清茗 2020-11-30 07:18

I have a data frame like this in pandas:

 column1      column2
 [a,b,c]        1
 [d,e,f]        2
 [g,h,i]        3

Expected outp

3条回答
  •  不知归路
    2020-11-30 08:03

    Another solution is to use the result_type='expand' argument of the pandas.apply function available since pandas 0.23. Answering @splinter's question this method can be generalized -- see below:

    import pandas as pd
    from numpy import arange
    
    df = pd.DataFrame(
        {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
        'column2': [1,2,3]}
    )
    
    pd.melt(
        df.join(
            df.apply(lambda row: row['column1'], axis=1, result_type='expand')
            ),
     value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2')[['column1','column2']]
    
    # can be generalized 
    
    df = pd.DataFrame(
        {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
        'column2': [1,2,3],
        'column3': [[1,2],[2,3],[3,4]],
        'column4': [42,23,321],
        'column5': ['a','b','c']}
    )
    
    (pd.melt(
        df.join(
            df.apply(lambda row: row['column1'], axis=1, result_type='expand')
            ),
     value_vars=arange(df['column1'].shape[0]), value_name='column1', id_vars=df.columns[1:])
     .drop(columns=['variable'])[list(df.columns[:1]) + list(df.columns[1:])]
     .sort_values(by=['column1']))
    

    UPDATE (for Jwely's comment): if you have lists with varying length, you can do:

    df = pd.DataFrame(
        {'column1' : [['a','b','c'],['d','f'],['g','h','i']],
        'column2': [1,2,3]}
    )
    
    longest = max(df['column1'].apply(lambda x: len(x)))
    
    pd.melt(
        df.join(
            df.apply(lambda row: row['column1'] if len(row['column1']) >= longest else row['column1'] + [None] * (longest - len(row['column1'])), axis=1, result_type='expand')
        ),
     value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2').query("column1 == column1")[['column1','column2']]
    

提交回复
热议问题