I have a pandas dataframe
in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as
After painful experimentation to find something faster than the accepted answer, I got this to work. It ran around 100x faster on the dataset I tried it on.
If someone knows a way to make this more elegant, by all means please modify my code. I couldn't find a way that works without setting the other columns you want to keep as the index and then resetting the index and re-naming the columns, but I'd imagine there's something else that works.
b = DataFrame(a.var1.str.split(',').tolist(), index=a.var2).stack()
b = b.reset_index()[[0, 'var2']] # var1 variable is currently labeled 0
b.columns = ['var1', 'var2'] # renaming var1