how to use pandas isin for multiple columns

后端 未结 3 2044
谎友^
谎友^ 2020-12-17 18:04

I want to find the values of col1 and col2 where the col1 and col2 of the first dataframe are

3条回答
  •  星月不相逢
    2020-12-17 18:38

    If somehow you must stick to isin or the negate version ~isin. You may first create a new column, with the concatenation of col1, col2. Then use isin to filter your data. Here is the code:

    import pandas as pd
    df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
    df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
    
    df1['indicator'] = df1['col1'].str.cat(df1['col2'])
    df2['indicator'] = df2['col1'].str.cat(df2['col2'])
    
    df2.loc[df2['indicator'].isin(df1['indicator'])].drop(columns=['indicator'])
    

    which gives

    
        col1    col2
    10  pizza   boy
    11  pizza   girl
    16  ice cream   boy
    

    If you do so remember to make sure that concatenating two columns doesn't create false positives e.g. concatenation of 123 and 456 in df1 and concatenation of 12 and 3456 in df2 will match even though their respective columns don't match. You can fix this problem by additional sep parameter.

    df1['indicator'] = df1['col1'].str.cat(df1['col2'], sep='$$$')
    df2['indicator'] = df2['col1'].str.cat(df2['col2'], sep='$$$')
    

提交回复
热议问题