Python pandas: exclude rows below a certain frequency count

前端 未结 3 1002
面向向阳花
面向向阳花 2020-12-08 05:29

So I have a pandas DataFrame that looks like this:

r vals    positions
1.2       1
1.8       2
2.3       1
1.8       1
2.1       3
2.0       3
1.9       1
..         


        
3条回答
  •  清歌不尽
    2020-12-08 06:11

    On your limited dataset the following works:

    In [125]:
    df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
    
    Out[125]:
    0    1.2
    2    2.3
    3    1.8
    6    1.9
    Name: r vals, dtype: float64
    

    You can assign the result of this filter and use this with isin to filter your orig df:

    In [129]:
    filtered = df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
    df[df['r vals'].isin(filtered)]
    
    Out[129]:
       r vals  positions
    0     1.2          1
    1     1.8          2
    2     2.3          1
    3     1.8          1
    6     1.9          1
    

    You just need to change 3 to 20 in your case

    Another approach would be to use value_counts to create an aggregate series, we can then use this to filter your df:

    In [136]:
    counts = df['positions'].value_counts()
    counts
    
    Out[136]:
    1    4
    3    2
    2    1
    dtype: int64
    
    In [137]:
    counts[counts > 3]
    
    Out[137]:
    1    4
    dtype: int64
    
    In [135]:
    df[df['positions'].isin(counts[counts > 3].index)]
    
    Out[135]:
       r vals  positions
    0     1.2          1
    2     2.3          1
    3     1.8          1
    6     1.9          1
    

    EDIT

    If you want to filter the groupby object on the dataframe rather than a Series then you can call filter on the groupby object directly:

    In [139]:
    filtered = df.groupby('positions').filter(lambda x: len(x) >= 3)
    filtered
    
    Out[139]:
       r vals  positions
    0     1.2          1
    2     2.3          1
    3     1.8          1
    6     1.9          1
    

提交回复
热议问题