Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe

后端 未结 2 834
情书的邮戳
情书的邮戳 2020-12-30 18:23

This question is based on another question I asked, where I didn\'t cover the problem entirely: Pandas - check if a string column contains a pair of strings

This is

2条回答
  •  既然无缘
    2020-12-30 18:27

    This is my answer using comprehensions and zip
    Note, this checks substrings in df1

    c = df1.consumption.values.tolist()
    f = df2.food.values.tolist()
    a = df2.creature.values.tolist() 
    
    check = np.array([[fd in cs and cr in cs for fd, cr in zip(f, a)] for cs in c])
    
    check.any(1)
    
    array([ True, False,  True, False, False,  True, False,  True, False], dtype=bool)
    

    This is a pandas version of what @MaxU did. Respect what he did... it is awesome!

    X = df1.consumption.str.get_dummies(' ')
    Y = (df2.creature + ' ' + df2.food).str.get_dummies(' ') \
        .reindex_axis(X.columns, 1, fill_value=0)
    
    # This is where you can see which rows from `df2` (columns)
    # matched with which rows from `df1` (rows) 
    XY = X.dot(Y.T)
    
    print(XY)
    
       0  1  2  3
    0  2  1  0  0
    1  1  1  1  0
    2  0  0  2  1
    3  0  1  1  1
    4  0  0  0  0
    5  1  2  0  0
    6  0  0  0  1
    7  0  0  1  2
    8  1  0  0  0
    
    # return the desired `True`s and `False`s
    
    XY.gt(1).any(1)
    
    0     True
    1    False
    2     True
    3    False
    4    False
    5     True
    6    False
    7     True
    8    False
    dtype: bool
    

    naive testing

提交回复
热议问题