Pandas split column of lists into multiple columns

后端 未结 8 1772
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-21 06:28

I have a pandas DataFrame with one column:

import pandas as pd

df = pd.DataFrame(
    data={
        \"teams\": [
            


        
8条回答
  •  萌比男神i
    2020-11-21 07:04

    You can use DataFrame constructor with lists created by to_list:

    import pandas as pd
    
    d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                    ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
    df2 = pd.DataFrame(d1)
    print (df2)
           teams
    0  [SF, NYG]
    1  [SF, NYG]
    2  [SF, NYG]
    3  [SF, NYG]
    4  [SF, NYG]
    5  [SF, NYG]
    6  [SF, NYG]
    

    df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)
    print (df2)
           teams team1 team2
    0  [SF, NYG]    SF   NYG
    1  [SF, NYG]    SF   NYG
    2  [SF, NYG]    SF   NYG
    3  [SF, NYG]    SF   NYG
    4  [SF, NYG]    SF   NYG
    5  [SF, NYG]    SF   NYG
    6  [SF, NYG]    SF   NYG
    

    And for new DataFrame:

    df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
    print (df3)
      team1 team2
    0    SF   NYG
    1    SF   NYG
    2    SF   NYG
    3    SF   NYG
    4    SF   NYG
    5    SF   NYG
    6    SF   NYG
    

    Solution with apply(pd.Series) is very slow:

    #7k rows
    df2 = pd.concat([df2]*1000).reset_index(drop=True)
    
    In [121]: %timeit df2['teams'].apply(pd.Series)
    1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
    1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

提交回复
热议问题