How to split/expand a string value into several pandas DataFrame rows?

前端 未结 3 1528
盖世英雄少女心
盖世英雄少女心 2020-11-27 22:30

Let\'s say my DataFrame df is created like this:

df = pd.DataFrame({\"title\" : [\"Robin Hood\", \"Madagaskar\"],
                  \"genres\" :         


        
3条回答
  •  忘掉有多难
    2020-11-27 23:11

    You can use np.repeat with numpy.concatenate for flattening.

    splitted = df['genres'].str.split(',\s*')
    l = splitted.str.len()
    
    df1 = pd.DataFrame({'title': np.repeat(df['title'].values, l),
                         'genres':np.concatenate(splitted.values)}, columns=['title','genres'])
    print (df1)
            title      genres
    0  Robin Hood      Action
    1  Robin Hood   Adventure
    2  Madagaskar      Family
    3  Madagaskar   Animation
    4  Madagaskar      Comedy
    

    Timings:

    df = pd.concat([df]*100000).reset_index(drop=True)
    
    In [95]: %%timeit
        ...: splitted = df['genres'].str.split(',\s*')
        ...: l = splitted.str.len()
        ...: 
        ...: df1 = pd.DataFrame({'title': np.repeat(df['title'].values, l),
        ...:                      'genres':np.concatenate(splitted.values)}, columns=['title','genres'])
        ...: 
        ...: 
    1 loop, best of 3: 709 ms per loop
    
    In [96]: %timeit (df.set_index('title')['genres'].str.split(',\s*', expand=True).stack().reset_index(name='genre').drop('level_1',1))
    1 loop, best of 3: 750 ms per loop
    

提交回复
热议问题