Create dummies from column with multiple values in pandas

前端未结

关注

 4  1160

说谎 2020-12-04 10:35

I am looking for for a pythonic way to handle the following problem.

The pandas.get_dummies() method is great to create dummies from a categorical colum

4条回答

再見小時候 (楼主)

2020-12-04 11:32

You can generate the dummies dataframe with your raw data, isolate the columns that contains a given atom, and then store the result matches back to the atom column.

df
Out[28]: 
  label
0     A
1     B
2     C
3     D
4   A*C
5   C*D

dummies = pd.get_dummies(df['label'])

atom_col = [c for c in dummies.columns if '*' not in c]

for col in atom_col:
    ...:     df[col] = dummies[[c for c in dummies.columns if col in c]].sum(axis=1)
    ...:     

df
Out[32]: 
  label  A  B  C  D
0     A  1  0  0  0
1     B  0  1  0  0
2     C  0  0  1  0
3     D  0  0  0  1
4   A*C  1  0  1  0
5   C*D  0  0  1  1

0 讨论(0)

查看其它4个回答