Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested
Another approach uses Series.clip:
q = s.quantile([.25, .75])
s = s[~s.clip(*q).isin(q)]
here are details:
s = pd.Series(np.randon.randn(100))
q = s.quantile([.25, .75]) # calculate lower and upper bounds
s = s.clip(*q) # assigns values outside boundary to boundary values
s = s[~s.isin(q)] # take only observations within bounds
Using it to filter a whole dataframe df
is straightforward:
def iqr(df, colname, bounds = [.25, .75]):
s = df[colname]
q = s.quantile(bounds)
return df[~s.clip(*q).isin(q)]
Note: the method excludes the boundaries themselves.