Convert pandas DataFrame column of comma separated strings to one-hot encoded

后端 未结 2 1018
生来不讨喜
生来不讨喜 2020-12-09 11:45

I have a large dataframe (‘data’) made up of one column. Each row in the column is made of a string and each string is made up of comma separated categories. I wish to one h

2条回答
  •  无人及你
    2020-12-09 12:38

    Note that you're not dealing with OHEs.

    str.split + stack + get_dummies + sum

    df = pd.DataFrame(data)
    df
    
          mesh
    0  A, B, C
    1      C,B
    2         
    
    (df.mesh.str.split('\s*,\s*', expand=True)
       .stack()
       .str.get_dummies()
       .sum(level=0))
    df
    
       A  B  C
    0  1  1  1
    1  0  1  1
    2  0  0  0
    

    apply + value_counts

    (df.mesh.str.split(r'\s*,\s*', expand=True)
       .apply(pd.Series.value_counts, 1)
       .iloc[:, 1:]
       .fillna(0, downcast='infer'))
    
       A  B  C
    0  1  1  1
    1  0  1  1
    2  0  0  0
    

    pd.crosstab

    x = df.mesh.str.split('\s*,\s*', expand=True).stack()
    pd.crosstab(x.index.get_level_values(0), x.values).iloc[:, 1:]
    df
    
    col_0  A  B  C
    row_0         
    0      1  1  1
    1      0  1  1
    2      0  0  0
    

提交回复
热议问题