How to duplicate rows based on a counter column

前端 未结 2 826
你的背包
你的背包 2020-12-17 03:32

Let\'s say I have a data frame called df

x count 
d 2
e 3
f 2

Count would be the counter column and the # times I want it to repeat.

<
相关标签:
2条回答
  • 2020-12-17 04:04

    You can use np.repeat()

    import pandas as pd
    import numpy as np
    
    # your data
    # ========================
    df
    
       x  count
    0  d      2
    1  e      3
    2  f      2
    
    # processing
    # ==================================
    np.repeat(df.values, df['count'].values, axis=0)
    
    
    array([['d', 2],
           ['d', 2],
           ['e', 3],
           ['e', 3],
           ['e', 3],
           ['f', 2],
           ['f', 2]], dtype=object)
    
    pd.DataFrame(np.repeat(df.values, df['count'].values, axis=0), columns=['x', 'count'])
    
       x count
    0  d     2
    1  d     2
    2  e     3
    3  e     3
    4  e     3
    5  f     2
    6  f     2
    
    0 讨论(0)
  • 2020-12-17 04:19

    You could use .loc with repeat like

    In [295]: df.loc[df.index.repeat(df['count'])].reset_index(drop=True)
    Out[295]:
       x  count
    0  d      2
    1  d      2
    2  e      3
    3  e      3
    4  e      3
    5  f      2
    6  f      2
    

    Or, using pd.Series.repeat you can

    In [278]: df.set_index('x')['count'].repeat(df['count']).reset_index()
    Out[278]:
       x  count
    0  d      2
    1  d      2
    2  e      3
    3  e      3
    4  e      3
    5  f      2
    6  f      2
    
    0 讨论(0)
提交回复
热议问题