Let\'s say my DataFrame df is created like this:
df = pd.DataFrame({\"title\" : [\"Robin Hood\", \"Madagaskar\"],
\"genres\" :
You can use np.repeat with numpy.concatenate for flattening.
splitted = df['genres'].str.split(',\s*')
l = splitted.str.len()
df1 = pd.DataFrame({'title': np.repeat(df['title'].values, l),
'genres':np.concatenate(splitted.values)}, columns=['title','genres'])
print (df1)
title genres
0 Robin Hood Action
1 Robin Hood Adventure
2 Madagaskar Family
3 Madagaskar Animation
4 Madagaskar Comedy
Timings:
df = pd.concat([df]*100000).reset_index(drop=True)
In [95]: %%timeit
...: splitted = df['genres'].str.split(',\s*')
...: l = splitted.str.len()
...:
...: df1 = pd.DataFrame({'title': np.repeat(df['title'].values, l),
...: 'genres':np.concatenate(splitted.values)}, columns=['title','genres'])
...:
...:
1 loop, best of 3: 709 ms per loop
In [96]: %timeit (df.set_index('title')['genres'].str.split(',\s*', expand=True).stack().reset_index(name='genre').drop('level_1',1))
1 loop, best of 3: 750 ms per loop