Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested
This will give you the subset of df which lies in the IQR of column column:
def subset_by_iqr(df, column, whisker_width=1.5):
"""Remove outliers from a dataframe by column, including optional
whiskers, removing rows for which the column value are
less than Q1-1.5IQR or greater than Q3+1.5IQR.
Args:
df (`:obj:pd.DataFrame`): A pandas dataframe to subset
column (str): Name of the column to calculate the subset from.
whisker_width (float): Optional, loosen the IQR filter by a
factor of `whisker_width` * IQR.
Returns:
(`:obj:pd.DataFrame`): Filtered dataframe
"""
# Calculate Q1, Q2 and IQR
q1 = df[column].quantile(0.25)
q3 = df[column].quantile(0.75)
iqr = q3 - q1
# Apply filter with respect to IQR, including optional whiskers
filter = (df[column] >= q1 - whisker_width*iqr) & (df[column] <= q3 + whisker_width*iqr)
return df.loc[filter]
# Example for whiskers = 1.5, as requested by the OP
df_filtered = subset_by_iqr(df, 'column_name', whisker_width=1.5)