how to split values in a datacolumn and adding it to a new column with a condition in pandas

╄→гoц情女王★ 提交于 2019-12-12 21:04:42

问题


I have a df,

name                        Value
Sri is a cricketer          Sri,is
Ram player                  Ram
Ravi is a singer            is
cricket and foot is ball    and,is,foot

and a list,

my_list=["is", "foot"]

I am trying to split df["value"] by (,) and adding the value to a new column if the value exists in my_list. My expected output is

name                      Value        my_list
Sri is a cricketer        Sri           is      
Ram player                Ram 
Ravi is a singer                        is     
cricket and foot is ball  and          is,foot

please help to achieve this, thanks in advance


回答1:


Use str.findall with str.join:

my_list=["is", "foot"]
df['my_list'] = df['Value'].str.findall('(' + '|'.join(my_list) + ')').str.join(',')
print (df)
                       name        Value  my_list
0        Sri is a cricketer       Sri,is       is
1                Ram player          Ram         
2          Ravi is a singer           is       is
3  cricket and foot is ball  and,is,foot  is,foot

Another solution with split and get intersections of sets:

my_list=["is", "foot"]
df['my_list']=df['Value'].str.split(',').apply(lambda x: set(x) & set(my_list)).str.join(',')
print (df)
                       name        Value  my_list
0        Sri is a cricketer       Sri,is       is
1                Ram player          Ram         
2          Ravi is a singer           is       is
3  cricket and foot is ball  and,is,foot  is,foot

And last:

df['Value'] = (df['Value'].str.replace('(' + '|,'.join(my_list) + ')', '')
                          .str.replace('[,]{2,}',',')
                          .str.strip(','))
print (df)
                       name Value  my_list
0        Sri is a cricketer   Sri       is
1                Ram player   Ram         
2          Ravi is a singer             is
3  cricket and foot is ball   and  is,foot

Or:

my_list=["is", "foot"]

s1 = df['Value'].str.split(',')

df['my_list'] = s1.apply(lambda x: set(x) & set(my_list)).str.join(',')
df['Value'] = s1.apply(lambda x: set(x) - set(my_list)).str.join(',')
print (df)

                       name Value  my_list
0        Sri is a cricketer   Sri       is
1                Ram player   Ram         
2          Ravi is a singer             is
3  cricket and foot is ball   and  is,foot


来源:https://stackoverflow.com/questions/47449545/how-to-split-values-in-a-datacolumn-and-adding-it-to-a-new-column-with-a-conditi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!