how to use pandas filter with IQR?

前端 未结 6 995
迷失自我
迷失自我 2020-12-28 13:17

Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested

6条回答
  •  天命终不由人
    2020-12-28 14:00

    As far as I know, the most compact notation seems to be brought by the query method.

    # Some test data
    np.random.seed(33454)
    df = (
        # A standard distribution
        pd.DataFrame({'nb': np.random.randint(0, 100, 20)})
            # Adding some outliers
            .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)}))
            # Reseting the index
            .reset_index(drop=True)
        )
    
    # Computing IQR
    Q1 = df['nb'].quantile(0.25)
    Q3 = df['nb'].quantile(0.75)
    IQR = Q3 - Q1
    
    # Filtering Values between Q1-1.5IQR and Q3+1.5IQR
    filtered = df.query('(@Q1 - 1.5 * @IQR) <= nb <= (@Q3 + 1.5 * @IQR)')
    

    Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

    # Ploting the result to check the difference
    df.join(filtered, rsuffix='_filtered').boxplot()
    

    Since this answer I've written a post on this topic were you may find more information.

提交回复
热议问题