Find occurrence of a 'string' in a subgroup column and mark maingroup based on its occurrence

☆樱花仙子☆ 提交于 2020-01-06 07:10:45

问题


I have data which looks like this:

Group   string
 A     Hello
 A     SearchListing
 A     GoSearch
 A     pen
 A     Hello
 B     Real-Estate
 B     Access
 B     Denied
 B     Group
 B     Group
 C     Glance
 C     NoSearch
 C     Home

and so on

I want to find out all those group who have "search" phrase in the strings and mark them as 0/1. At the same time I want to aggregate results like unique strings and total strings with respect to each group and also, how many times "search" was encountered by that group. The end results which I want is something like this:

Group   containsSearch  TotalStrings  UniqueStrings  NoOfTimesSearch
 A           1              5             4              2
 B           0              5             4              0
 C           1              3             3              1 

I can aggregate using a simple groupby clause, but I am having problems on how to mark the group as 0/1 based on the presence of "search" and counting how many times it was encountered.


回答1:


Let's try:

l1 = lambda x: x.str.lower().str.contains('search').any().astype(int)
l1.__name__ = 'containsSearch'
l2 = lambda x: x.str.lower().str.contains('search').sum().astype(int)
l2.__name__ = 'NoOfTimesSEarch'

df.groupby('Group')['string'].agg(['count','nunique',l1,l2]).reset_index()

Output:

  Group  count  nunique  containsSearch  NooOfTimesSEarch
0     A      5        4               1                2
1     B      5        4               0                0
2     C      3        3               1                1

Or using defined functions thanks, @W-B:

def conatinsSearch(x):
    return x.str.lower().str.contains('search').any().astype(int)

def NoOfTimesSearch(x):
    return x.str.lower().str.contains('search').sum().astype(int)


df.groupby('Group')['string'].agg(['count', 'nunique',
                                   conatinsSearch, NoOfTimesSearch]).reset_index()

Output:

  Group  count  nunique  conatinsSearch  NoOfTimesSearch
0     A      5        4               1                2
1     B      5        4               0                0
2     C      3        3               1                1



回答2:


If you want to create a function:

def my_agg(x):
    names = {
    'containsSearch' : int(x['string'].str.lower().str.contains('search').any()),
    'TotalStrings' : x['string'].count(),
    'UniqueStrings' : x['string'].drop_duplicates().count(),
    'NoOfTimesSearch' : int(x[x['string'].str.lower().str.contains('search')].count())
    }

    return pd.Series(names)

df.groupby('Group').apply(my_agg)

       containsSearch  TotalStrings  UniqueStrings  NoOfTimesSearch
Group                                                              
A                   1             5              4                2
B                   0             5              4                0
C                   1             3              3                1


来源:https://stackoverflow.com/questions/54406962/find-occurrence-of-a-string-in-a-subgroup-column-and-mark-maingroup-based-on-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!