Apply function to pandas DataFrame that can return multiple rows

后端 未结 5 1326
轮回少年
轮回少年 2020-12-03 01:37

I am trying to transform DataFrame, such that some of the rows will be replicated a given number of times. For example:

df = pd.DataFrame({\'class\': [\'A\',         


        
5条回答
  •  日久生厌
    2020-12-03 02:12

    I know this is an old question, but I was having trouble getting Wes' answer to work for multiple columns in the dataframe so I made his code a bit more generic. Thought I'd share in case anyone else stumbles on this question with the same problem.

    You just basically specify what column has the counts in it in and you get an expanded dataframe in return.

    import pandas as pd
    df = pd.DataFrame({'class 1': ['A','B','C','A'],
                       'class 2': [ 1,  2,  3,  1], 
                       'count':   [ 3,  3,  3,  1]})
    print df,"\n"
    
    def f(group, *args):
        row = group.irow(0)
        Dict = {}
        row_dict = row.to_dict()
        for item in row_dict: Dict[item] = [row[item]] * row[args[0]]
        return pd.DataFrame(Dict)
    
    def ExpandRows(df,WeightsColumnName):
        df_expand = df.groupby(df.columns.tolist(), group_keys=False).apply(f,WeightsColumnName).reset_index(drop=True)
        return df_expand
    
    
    df_expanded = ExpandRows(df,'count')
    print df_expanded
    

    Returns:

      class 1  class 2  count
    0       A        1      3
    1       B        2      3
    2       C        3      3
    3       A        1      1 
    
      class 1  class 2  count
    0       A        1      1
    1       A        1      3
    2       A        1      3
    3       A        1      3
    4       B        2      3
    5       B        2      3
    6       B        2      3
    7       C        3      3
    8       C        3      3
    9       C        3      3
    

    With regards to speed, my base df is 10 columns by ~6k rows and when expanded is ~100,000 rows takes ~7 seconds. I'm not sure in this case if grouping is necessary or wise since it's taking all the columns to group form, but hey whatever only 7 seconds.

提交回复
热议问题