Split (explode) pandas dataframe string entry to separate rows

后端 未结 22 4145
一向
一向 2020-11-21 05:03

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as

22条回答
  •  没有蜡笔的小新
    2020-11-21 05:31

    I have been struggling with out-of-memory experience using various way to explode my lists so I prepared some benchmarks to help me decide which answers to upvote. I tested five scenarios with varying proportions of the list length to the number of lists. Sharing the results below:

    Time: (less is better, click to view large version)

    Peak memory usage: (less is better)

    Conclusions:

    • @MaxU's answer (update 2), codename concatenate offers the best speed in almost every case, while keeping the peek memory usage low,
    • see @DMulligan's answer (codename stack) if you need to process lots of rows with relatively small lists and can afford increased peak memory,
    • the accepted @Chang's answer works well for data frames that have a few rows but very large lists.

    Full details (functions and benchmarking code) are in this GitHub gist. Please note that the benchmark problem was simplified and did not include splitting of strings into the list - which most solutions performed in a similar fashion.

提交回复
热议问题