In Pandas, how to delete rows from a Data Frame based on another Data Frame?

后端 未结 1 1206
后悔当初
后悔当初 2020-12-08 15:02

I have 2 Data Frames, one named USERS and another named EXCLUDE. Both of them have a field named \"email\".

Basically, I want to remove every row in USERS that has a

相关标签:
1条回答
  • 2020-12-08 15:25

    You can use boolean indexing and condition with isin, inverting boolean Series is by ~:

    import pandas as pd
    
    USERS = pd.DataFrame({'email':['a@g.com','b@g.com','b@g.com','c@g.com','d@g.com']})
    print (USERS)
         email
    0  a@g.com
    1  b@g.com
    2  b@g.com
    3  c@g.com
    4  d@g.com
    
    EXCLUDE = pd.DataFrame({'email':['a@g.com','d@g.com']})
    print (EXCLUDE)
         email
    0  a@g.com
    1  d@g.com
    
    print (USERS.email.isin(EXCLUDE.email))
    0     True
    1    False
    2    False
    3    False
    4     True
    Name: email, dtype: bool
    
    print (~USERS.email.isin(EXCLUDE.email))
    0    False
    1     True
    2     True
    3     True
    4    False
    Name: email, dtype: bool
    
    print (USERS[~USERS.email.isin(EXCLUDE.email)])
         email
    1  b@g.com
    2  b@g.com
    3  c@g.com
    

    Another solution with merge:

    df = pd.merge(USERS, EXCLUDE, how='outer', indicator=True)
    print (df)
         email     _merge
    0  a@g.com       both
    1  b@g.com  left_only
    2  b@g.com  left_only
    3  c@g.com  left_only
    4  d@g.com       both
    
    print (df.loc[df._merge == 'left_only', ['email']])
         email
    1  b@g.com
    2  b@g.com
    3  c@g.com
    
    0 讨论(0)
提交回复
热议问题