Convert pandas series of lists to dataframe

前端 未结 7 1604
执笔经年
执笔经年 2020-12-14 04:16

I have a series made of lists

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])

and I want a DataFrame with each column a list.

相关标签:
7条回答
  • 2020-12-14 04:25

    As @Hatshepsut pointed out in the comments, from_items is deprecated as of version 0.23. The link suggests to use from_dict instead, so the old answer can be modified to:

    pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
    

    --------------------------------------------------OLD ANSWER-------------------------------------------------------------

    You can use from_items like this (assuming that your lists are of the same length):

    pd.DataFrame.from_items(zip(s.index, s.values))
    
       0  1
    0  1  4
    1  2  5
    2  3  6
    

    or

    pd.DataFrame.from_items(zip(s.index, s.values)).T
    
       0  1  2
    0  1  2  3
    1  4  5  6
    

    depending on your desired output.

    This can be much faster than using an apply (as used in @Wen's answer which, however, does also work for lists of different length):

    %timeit pd.DataFrame.from_items(zip(s.index, s.values))
    1000 loops, best of 3: 669 µs per loop
    
    %timeit s.apply(lambda x:pd.Series(x)).T
    1000 loops, best of 3: 1.37 ms per loop
    

    and

    %timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
    1000 loops, best of 3: 919 µs per loop
    
    %timeit s.apply(lambda x:pd.Series(x))
    1000 loops, best of 3: 1.26 ms per loop
    

    Also @Hatshepsut's answer is quite fast (also works for lists of different length):

    %timeit pd.DataFrame(item for item in s)
    1000 loops, best of 3: 636 µs per loop
    

    and

    %timeit pd.DataFrame(item for item in s).T
    1000 loops, best of 3: 884 µs per loop
    

    Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest in Python 3.6+):

    %timeit pd.DataFrame.from_records(izip_longest(*s.values))
    1000 loops, best of 3: 529 µs per loop
    

    An additional option:

    pd.DataFrame(dict(zip(s.index, s.values)))
    
       0  1
    0  1  4
    1  2  5
    2  3  6
    
    0 讨论(0)
  • 2020-12-14 04:25

    If the length of the series is super high (more than 1m), you can use:

    s = pd.Series([[1, 2, 3], [4, 5, 6]])
    pd.DataFrame(s.tolist())
    
    0 讨论(0)
  • 2020-12-14 04:27

    Note that the from_items() method in the accepted answer is deprecated in the latest Pandas and from_dict() method should be used instead. Here is how:

    pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
    
    ## OR  
    
    pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
    

    Also note that using from_dict() provides us with the fastest approach so far:

    %timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values)))
    376 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    ## OR
    
    %timeit pd.DataFrame.from_dict(dict(zip(s.index, s.values))).T
    487 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    0 讨论(0)
  • 2020-12-14 04:39

    pd.DataFrame.from_records should also work using itertools.zip_longest:

    from itertools import zip_longest
    
    pd.DataFrame.from_records(zip_longest(*s.values))
    
    #    0  1
    # 0  1  4
    # 1  2  5
    # 2  3  6
    
    0 讨论(0)
  • 2020-12-14 04:40

    Try:

    import numpy as np, pandas as pd
    s = pd.Series([[1, 2, 3], [4, 5, 6]])
    pd.DataFrame(np.vstack(s))
    
    0 讨论(0)
  • 2020-12-14 04:45

    You may looking for

    s.apply(lambda x:pd.Series(x))
       0  1  2
    0  1  2  3
    1  4  5  6
    

    Or

     s.apply(lambda x:pd.Series(x)).T
    
    Out[133]: 
       0  1
    0  1  4
    1  2  5
    2  3  6
    
    0 讨论(0)
提交回复
热议问题