Calculate correlation between columns of strings

醉酒当歌 提交于 2019-12-10 23:31:43

问题


I've got a df that contains the columns profession and media. I would like to calculate the correlation between those two columns.

Is there a short hack of calculating the correlation of columns of strings? Or do I have transform each profession and media to a number and then calculate the correlation with .corr()?

I found a similar question (Is there a way to get correlation with string data and a numerical value in pandas?) but I would like to check the string, not each word within the string.

df

  profession        media      

0 media lawyer      print
1 student           online
2 student           print
3 professor         online
4 media lawyer      online

回答1:


You can convert datatype to categorical and then do it

df['profession']=df['profession'].astype('category').cat.codes
df['media']=df['media'].astype('category').cat.codes
df.corr()


来源:https://stackoverflow.com/questions/51241575/calculate-correlation-between-columns-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!