Python: Removing Rows on Count condition

前端 未结 4 1122
攒了一身酷
攒了一身酷 2020-12-09 10:07

I have a problem filtering a pandas dataframe.

city 
NYC 
NYC 
NYC 
NYC 
SYD 
SYD 
SEL 
SEL
...

df.city.value_counts()

I woul

4条回答
  •  鱼传尺愫
    2020-12-09 10:37

    This is one way using pd.Series.value_counts.

    counts = df['city'].value_counts()
    
    res = df[~df['city'].isin(counts[counts < 5].index)]
    

    counts is a pd.Series object. counts < 5 returns a Boolean series. We filter the counts series by the Boolean counts < 5 series (that's what the square brackets achieve). We then take the index of the resultant series to find the cities with < 5 counts. ~ is the negation operator.

    Remember a series is a mapping between index and value. The index of a series does not necessarily contain unique values, but this is guaranteed with the output of value_counts.

提交回复
热议问题