Pandas column of lists, create a row for each list element

前端 未结 10 898
有刺的猬
有刺的猬 2020-11-22 06:59

I have a dataframe where some cells contain lists of multiple values. Rather than storing multiple values in a cell, I\'d like to expand the dataframe so that each item in t

10条回答
  •  眼角桃花
    2020-11-22 07:47

    Also very late, but here is an answer from Karvy1 that worked well for me if you don't have pandas >=0.25 version: https://stackoverflow.com/a/52511166/10740287

    For the example above you may write:

    data = [(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples]
    data = pd.DataFrame(data, columns=['subject', 'trial_num', 'samples'])
    

    Speed test:

    %timeit data = pd.DataFrame([(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples], columns=['subject', 'trial_num', 'samples'])
    

    1.33 ms ± 74.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    %timeit data = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack().reset_index()
    

    4.9 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    %timeit data = pd.DataFrame({col:np.repeat(df[col].values, df['samples'].str.len())for col in df.columns.drop('samples')}).assign(**{'samples':np.concatenate(df['samples'].values)})
    

    1.38 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

提交回复
热议问题