Pandas - check if a string column contains a pair of strings

余生长醉 提交于 2020-01-24 01:07:54

问题


Let's say I have a DataFrame like this:

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 
                                  'monkey eats banana', 'badger eats banana'], 
                   'food':['apple', 'apple', 'banana', 'banana'], 
                   'creature':['squirrel', 'badger', 'monkey', 'elephant']})

    consumption creature    food
0   squirrel eats apple squirrel    apple
1   monkey eats apple   badger  apple
2   monkey eats banana  monkey  banana
3   badger eats banana  elephant    banana

I want to find rows where the 'creature' & 'food' occur in combination in the 'consumption' column i.e. if apple and squirrel occure together, then True but if Apple occur with Elephant it's False. Similarly, if Monkey & Banana occur together, then True, but Monkey-Apple would be false.

The approach I was trying was something like :

creature_list = list(df['creature'])
creature_list = '|'.join(map(str, creature_list))

food_list = list(df['food'])
food_list = '|'.join(map(str, food_list))

np.where((df['consumption'].str.contains('('+creature_list+')', case = False)) 
          & (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0)

But this doesn't work since I get True in all instances.

How can I check for string pairs ?


回答1:


Here's one possible way:

def match_consumption(r):
    if (r['creature'] in r['consumption']) and (r['food'] in r['consumption']):
        return True
    else:
        return False

df['match'] = df.apply(match_consumption, axis=1)
df

           consumption  creature    food  match
0  squirrel eats apple  squirrel   apple   True
1    monkey eats apple    badger   apple  False
2   monkey eats banana    monkey  banana   True
3   badger eats banana  elephant  banana  False



回答2:


Is checking for string equality too simple? You can test if the string <creature> eats <food> equals the respective value in the consumption column:

(df.consumption == df.creature + " eats " + df.food)



回答3:


I'm sure there is a better way to do this. But this is one way.

import pandas as pd
import re

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 'monkey eats banana', 'badger eats banana'], 'food':['apple', 'apple', 'banana', 'banana'], 'creature':['squirrel', 'badger', 'monkey', 'elephant']})

test = []
for i in range(len(df.consumption)):
    test.append(bool(re.search(df.creature[i],df.consumption[i])) & bool((re.search(df.food[i], df.consumption[i]))))
df['test'] = test


来源:https://stackoverflow.com/questions/43442591/pandas-check-if-a-string-column-contains-a-pair-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!