efficient concatenation of lists in pandas series

浪子不回头ぞ 提交于 2020-02-28 06:57:09

问题


I have the following series:

s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0    [a, b]
1    [c, d]
2    [f, g]
dtype: object

what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:

l = ['a', 'b', 'c', 'd', 'f', 'g']

Thanks!


回答1:


A nested list comprehension should be much faster.

>>> [element for list_ in s for element in list_]
    ['a', 'b', 'c', 'd', 'f', 'g']

>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop

>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop

Directly accessing the values of the list is even faster.

>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop



回答2:


I'm not timing or testing these options, but there's the new pandas method explode, and also numpy.concatenate.



来源:https://stackoverflow.com/questions/33556050/efficient-concatenation-of-lists-in-pandas-series

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!