Filter out rows based on list of strings in Pandas

后端 未结 2 503
礼貌的吻别
礼貌的吻别 2020-12-08 19:54

I have a large time series data frame (called df), and the first 5 records look like this:

df

         stn     years_of_data  total_minutes         


        
相关标签:
2条回答
  • 2020-12-08 20:34

    Use isin:

    cleaned = df[~df['stn'].isin(remove_list)]
    
    In [7]:
    
    remove_list = ['Arbutus','Bayside']
    df[~df['stn'].isin(remove_list)]
    Out[7]:
                              stn  years_of_data  total_minutes  avg_daily  \
    date                                                                     
    1900-01-14  AlberniElementary              4           5745       34.1   
    1900-01-14     AlberniWeather              6           7129       29.5   
    1900-01-14          Arrowview              7          10080       27.6   
    
                TOA_daily  K_daily  
    date                            
    1900-01-14      114.6    0.298  
    1900-01-14      114.6    0.257  
    1900-01-14      114.6    0.241  
    
    0 讨论(0)
  • 2020-12-08 20:36

    Had a similar question, found this old thread, I think there are other ways to get the same result. My issue with @EdChum's solution for my particular application is that I don't have a list that will be matched exactly. If you have the same issue, .isin isn't meant for that application.

    Instead, you can also try a few options, including a numpy.where:

      removelist = ['ayside','rrowview']
      df['flagCol'] = numpy.where(df.stn.str.contains('|'.join(remove_list)),1,0)
    

    Note that this solution doesn't actually remove the matching rows, just flags them. You can copy/slice/drop as you like.

    This solution would be useful in the case that you don't know, for example, if the station names are capitalized or not and don't want to go through standardizing text beforehand. numpy.where is usually pretty fast as well, probably not much different from .isin.

    0 讨论(0)
提交回复
热议问题