Counting the Frequency of words in a pandas data frame

后端 未结 3 1138
天命终不由人
天命终不由人 2020-12-04 18:01

I have a table like below:

      URN                   Firm_Name
0  104472               R.X. Yah & Co
1  104873        Big Building Society
2  109986            


        
3条回答
  •  日久生厌
    2020-12-04 18:29

    IIUIC, use value_counts()

    In [3361]: df.Firm_Name.str.split(expand=True).stack().value_counts()
    Out[3361]:
    Society       3
    Ltd           2
    James's       1
    R.X.          1
    Yah           1
    Associates    1
    St            1
    Kensington    1
    MMV           1
    Big           1
    &             1
    The           1
    Co            1
    Oil           1
    Building      1
    dtype: int64
    

    Or,

    pd.Series(np.concatenate([x.split() for x in df.Firm_Name])).value_counts()
    

    Or,

    pd.Series(' '.join(df.Firm_Name).split()).value_counts()
    

    For top N, for example 3

    In [3379]: pd.Series(' '.join(df.Firm_Name).split()).value_counts()[:3]
    Out[3379]:
    Society    3
    Ltd        2
    James's    1
    dtype: int64
    

    Details

    In [3380]: df
    Out[3380]:
          URN                   Firm_Name
    0  104472               R.X. Yah & Co
    1  104873        Big Building Society
    2  109986          St James's Society
    3  114058  The Kensington Society Ltd
    4  113438      MMV Oil Associates Ltd
    

提交回复
热议问题