how to use pandas filter with IQR?

前端 未结 6 1023
迷失自我
迷失自我 2020-12-28 13:17

Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested

6条回答
  •  没有蜡笔的小新
    2020-12-28 13:54

    This will give you the subset of df which lies in the IQR of column column:

    def subset_by_iqr(df, column, whisker_width=1.5):
        """Remove outliers from a dataframe by column, including optional 
           whiskers, removing rows for which the column value are 
           less than Q1-1.5IQR or greater than Q3+1.5IQR.
        Args:
            df (`:obj:pd.DataFrame`): A pandas dataframe to subset
            column (str): Name of the column to calculate the subset from.
            whisker_width (float): Optional, loosen the IQR filter by a
                                   factor of `whisker_width` * IQR.
        Returns:
            (`:obj:pd.DataFrame`): Filtered dataframe
        """
        # Calculate Q1, Q2 and IQR
        q1 = df[column].quantile(0.25)                 
        q3 = df[column].quantile(0.75)
        iqr = q3 - q1
        # Apply filter with respect to IQR, including optional whiskers
        filter = (df[column] >= q1 - whisker_width*iqr) & (df[column] <= q3 + whisker_width*iqr)
        return df.loc[filter]                                                     
    
    # Example for whiskers = 1.5, as requested by the OP
    df_filtered = subset_by_iqr(df, 'column_name', whisker_width=1.5)
    

提交回复
热议问题