Python, count frequency of occurrence for value in another column

元气小坏坏 提交于 2019-12-11 07:16:57

问题


So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls. I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately.

I have two dataframes df1 looks like:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'abc'           '12'    '23'
4     'abc'           '12'    '23'
5     'cde'           '78'    '90'
6     'bcd'           '45'    '56'

df2 looks like:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'cde'           '78'    '90'

I am trying to return a count the number of times SubscriberKey: 'abc' occurs in the dataframe. After finding the values, I would like to append the count to another dataframe (df2) which is my first dataframe deduplicated.

It would look like this:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '3'
2     'bcd'           '45'    '56'           '1'
3     'cde'           '78'    '90'           '1'

So what I did was try use this line:

    df1.groupby(['SubscriberKey']).size()

The reason I only used 'SubscriberKey' was because some rows only had that column filled out with 'OtherID' and 'AnotherID' blank.

I have also tried Series.value_count(). When I try using groupby and size() and set the value of df2['Total Instances'] to the count of occurrences, it appears that the values do not line up correctly.

For example new table looks like this:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '1'
2     'bcd'           '45'    '56'           '3'
3     'cde'           '78'    '90'           '2'

So my original thought was maybe when doing groupby, the function sorts my output automatically. I tried to check by saving the groupby'd table as a csv and realized it only prints out the count column and not the associated subscriberkey column with it.

Anyhow, does anybody have any input as to how I can achieve this? To reiterate, I wanted to essentially just add a column to df2 that returns total # of occurrences or instances within df1.

Thanks!


回答1:


You can try:

df2['Total Instances'] = df2['Subscriber Key'].map(df1['Subscriber Key'].value_counts())


来源:https://stackoverflow.com/questions/46394556/python-count-frequency-of-occurrence-for-value-in-another-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!