Replicating rows in a pandas data frame by a column value

后端 未结 3 1731
北海茫月
北海茫月 2020-11-28 09:35

I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

import pandas as pd

what_i_have = pd.D         


        
3条回答
  •  爱一瞬间的悲伤
    2020-11-28 10:28

    You could use np.repeat to get the repeated indices and then use that to index into the frame:

    >>> df2 = df.loc[np.repeat(df.index.values,df.n)]
    >>> df2
      id  n   v
    0  A  1  10
    1  B  2  13
    1  B  2  13
    2  C  3   8
    2  C  3   8
    2  C  3   8
    

    After which there's only a bit of cleaning up to do:

    >>> df2 = df2.drop("n",axis=1).reset_index(drop=True)
    >>> df2
      id   v
    0  A  10
    1  B  13
    2  B  13
    3  C   8
    4  C   8
    5  C   8
    

    Note that if you might have duplicate indices to worry about, you could use .iloc instead:

    In [86]: df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
    Out[86]: 
      id   v
    0  A  10
    1  B  13
    2  B  13
    3  C   8
    4  C   8
    5  C   8
    

    which uses the positions, and not the index labels.

提交回复
热议问题