how to use pandas filter with IQR?

前端 未结 6 968
迷失自我
迷失自我 2020-12-28 13:17

Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested

6条回答
  •  独厮守ぢ
    2020-12-28 13:57

    Another approach uses Series.clip:

    q = s.quantile([.25, .75])
    s = s[~s.clip(*q).isin(q)]
    

    here are details:

    s = pd.Series(np.randon.randn(100))
    q = s.quantile([.25, .75])  # calculate lower and upper bounds
    s = s.clip(*q)  # assigns values outside boundary to boundary values
    s = s[~s.isin(q)]  # take only observations within bounds
    

    Using it to filter a whole dataframe df is straightforward:

    def iqr(df, colname, bounds = [.25, .75]):
        s = df[colname]
        q = s.quantile(bounds)
        return df[~s.clip(*q).isin(q)]
    

    Note: the method excludes the boundaries themselves.

提交回复
热议问题