I have a pandas dataframe
in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as
I have been struggling with out-of-memory experience using various way to explode my lists so I prepared some benchmarks to help me decide which answers to upvote. I tested five scenarios with varying proportions of the list length to the number of lists. Sharing the results below:
Time: (less is better, click to view large version)
Peak memory usage: (less is better)
Conclusions:
Full details (functions and benchmarking code) are in this GitHub gist. Please note that the benchmark problem was simplified and did not include splitting of strings into the list - which most solutions performed in a similar fashion.