How to create a bag of words from a pandas dataframe

前端 未结 1 1349
走了就别回头了
走了就别回头了 2020-12-19 07:09

Here\'s my dataframe

    CATEGORY    BRAND
0   Noodle  Anak Mas
1   Noodle  Anak Mas
2   Noodle  Indomie
3   Noodle  Indomie
4   Noodle  Indomie
23  Noodle           


        
相关标签:
1条回答
  • 2020-12-19 07:49

    IIUIC, use

    Option 1] Numpy flatten and split

    In [2535]: collections.Counter([y for x in df.values.flatten() for y in x.split()])
    Out[2535]:
    Counter({'3': 2,
             'Anak': 2,
             'Cap': 2,
             'Indomie': 4,
             'Mas': 2,
             'Mi': 2,
             'Mie': 2,
             'Noodle': 10,
             'Pop': 2,
             'Telor': 2})
    

    Option 2] Use value_counts()

    In [2536]: pd.Series([y for x in df.values.flatten() for y in x.split()]).value_counts()
    Out[2536]:
    Noodle     10
    Indomie     4
    Mie         2
    Pop         2
    Anak        2
    Mi          2
    Cap         2
    Telor       2
    Mas         2
    3           2
    dtype: int64
    

    Options 3] Use stack and value_counts

    In [2582]: df.apply(lambda x: x.str.split(expand=True).stack()).stack().value_counts()
    Out[2582]:
    Noodle     10
    Indomie     4
    Mie         2
    Pop         2
    Anak        2
    Mi          2
    Cap         2
    Telor       2
    Mas         2
    3           2
    dtype: int64
    

    Details

    In [2516]: df
    Out[2516]:
       CATEGORY           BRAND
    0    Noodle        Anak Mas
    1    Noodle        Anak Mas
    2    Noodle         Indomie
    3    Noodle         Indomie
    4    Noodle         Indomie
    23   Noodle         Indomie
    24   Noodle  Mi Telor Cap 3
    25   Noodle  Mi Telor Cap 3
    26   Noodle         Pop Mie
    27   Noodle         Pop Mie
    
    0 讨论(0)
提交回复
热议问题