Search for “does-not-contain” on a DataFrame in pandas

后端 未结 6 2063
谎友^
谎友^ 2020-11-28 02:31

I\'ve done some searching and can\'t figure out how to filter a dataframe by df[\"col\"].str.contains(word), however I\'m wondering if there is a way to do the

相关标签:
6条回答
  • 2020-11-28 02:53

    I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:

    df[df["col"].str.contains('this|that')==False]
    
    0 讨论(0)
  • 2020-11-28 02:53

    You can use Apply and Lambda to select rows where a column contains any thing in a list. For your scenario :

    df[df["col"].apply(lambda x:x not in [word1,word2,word3])]
    
    0 讨论(0)
  • 2020-11-28 02:55

    Additional to nanselm2's answer, you can use 0 instead of False:

    df["col"].str.contains(word)==0
    
    0 讨论(0)
  • 2020-11-28 03:04

    I had to get rid of the NULL values before using the command recommended by Andy above. An example:

    df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
    df.ix[:, 'first'] = 'myword'
    df.ix[0, 'second'] = 'myword'
    df.ix[2, 'second'] = 'myword'
    df.ix[1, 'third'] = 'myword'
    df
    
        first   second  third
    0   myword  myword   NaN
    1   myword  NaN      myword 
    2   myword  myword   NaN
    

    Now running the command:

    ~df["second"].str.contains(word)
    

    I get the following error:

    TypeError: bad operand type for unary ~: 'float'
    

    I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.

    0 讨论(0)
  • 2020-11-28 03:07

    I hope the answers are already posted

    I am adding the framework to find multiple words and negate those from dataFrame.

    Here 'word1','word2','word3','word4' = list of patterns to search

    df = DataFrame

    column_a = A column name from from DataFrame df

    Search_for_These_values = ['word1','word2','word3','word4'] 
    
    pattern = '|'.join(Search_for_These_values)
    
    result = df.loc[~(df['column_a'].str.contains(pattern, case=False)]
    
    0 讨论(0)
  • 2020-11-28 03:09

    You can use the invert (~) operator (which acts like a not for boolean data):

    new_df = df[~df["col"].str.contains(word)]
    

    , where new_df is the copy returned by RHS.

    contains also accepts a regular expression...


    If the above throws a ValueError, the reason is likely because you have mixed datatypes, so use na=False:

    new_df = df[~df["col"].str.contains(word, na=False)]
    

    Or,

    new_df = df[df["col"].str.contains(word) == False]
    
    0 讨论(0)
提交回复
热议问题