Pandas: Convert lists within a single column to multiple columns

心已入冬 提交于 2019-12-07 04:37:37

问题


I have a dataframe that includes columns with multiple attributes separated by commas:

df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]})

   id   labels
0   1   a,b,c
1   2   c,a
2   3   d,a,b

(I know this isn't an ideal situation, but the data originates from an external source.) I want to turn the multi-attribute columns into multiple columns, one for each label, so that I can treat them as categorical variables. Desired output:

    id  a       b       c       d   
0    1  True    True    True    False   
1    2  True    False   True    False   
2    3  True    True    False   True

I can get the set of all possible attributes ([a,b,c,d]) fairly easily, but cannot figure out a way to determine whether a given row has a particular attribute without row-by-row iteration for each attribute. Is there a better way to do this?


回答1:


You can use get_dummies, cast 1 and 0 to boolean by astype and last concat column id:

print df['labels'].str.get_dummies(sep=',').astype(bool)
      a      b      c      d
0  True   True   True  False
1  True  False   True  False
2  True   True  False   True

print pd.concat([df.id, df['labels'].str.get_dummies(sep=',').astype(bool)], axis=1)

   id     a      b      c      d
0   1  True   True   True  False
1   2  True  False   True  False
2   3  True   True  False   True


来源:https://stackoverflow.com/questions/37262437/pandas-convert-lists-within-a-single-column-to-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!