Apply function to pandas DataFrame that can return multiple rows

后端 未结 5 1305
轮回少年
轮回少年 2020-12-03 01:37

I am trying to transform DataFrame, such that some of the rows will be replicated a given number of times. For example:

df = pd.DataFrame({\'class\': [\'A\',         


        
5条回答
  •  情深已故
    2020-12-03 02:15

    There is even a simpler and significantly more efficient solution. I had to make similar modification for a table of about 3.5M rows, and the previous suggested solutions were extremely slow.

    A better way is to use numpy's repeat procedure for generating a new index in which each row index is repeated multiple times according to its given count, and use iloc to select rows of the original table according to this index:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count': [1, 0, 2]})
    spread_ixs = np.repeat(range(len(df)), df['count'])
    spread_ixs 
    
    array([0, 2, 2])
    
    df.iloc[spread_ixs, :].drop(columns='count').reset_index(drop=True)
    
      class
    0     A
    1     C
    2     C
    

提交回复
热议问题