Check if pandas column contains all elements from a list

后端 未结 7 2012
忘掉有多难
忘掉有多难 2020-12-09 03:14

I have a df like this:

frame = pd.DataFrame({\'a\' : [\'a,b,c\', \'a,c,f\', \'b,d,f\',\'a,z,c\']})

And a list of items:

let         


        
7条回答
  •  离开以前
    2020-12-09 03:39

    One way is to split the column values into lists using str.split, and check if set(letters) is a subset of the obtained lists:

    letters_s = set(letters)
    frame[frame.a.str.split(',').map(letters_s.issubset)]
    
         a
    0  a,b,c
    1  a,c,f
    3  a,z,c
    ​
    

    Benchmark:

    def serge(frame):
        contains = [frame['a'].str.contains(i) for i in letters]
        return frame[np.all(contains, axis=0)]
    
    def yatu(frame):
        letters_s = set(letters)
        return frame[frame.a.str.split(',').map(letters_s.issubset)]
    
    def austin(frame):
        mask =  frame.a.apply(lambda x: np.intersect1d(x.split(','), letters).size > 0)
        return frame[mask]
    
    def datanovice(frame):
        s = frame['a'].str.split(',').explode().isin(letters).groupby(level=0).cumsum()
        return frame.loc[s[s.ge(2)].index.unique()]
    
    perfplot.show(
        setup=lambda n: pd.concat([frame]*n, axis=0).reset_index(drop=True), 
    
        kernels=[
            lambda df: serge(df),
            lambda df: yatu(df),
            lambda df: df[df['a'].apply(lambda x: np.all([*map(lambda l: l in x, letters)]))],
            lambda df: austin(df),
            lambda df: datanovice(df),
        ],
    
        labels=['serge', 'yatu', 'bruno','austin', 'datanovice'],
        n_range=[2**k for k in range(0, 18)],
        equality_check=lambda x, y: x.equals(y),
        xlabel='N'
    )
    

提交回复
热议问题