How do you find the IQR in Numpy?

前端 未结 3 564
夕颜
夕颜 2020-12-08 08:56

Is there a baked-in Numpy/Scipy function to find the interquartile range? I can do it pretty easily myself, but mean() exists which is basically sum/len<

相关标签:
3条回答
  • 2020-12-08 09:34

    np.percentile takes multiple percentile arguments, and you are slightly better off doing:

    q75, q25 = np.percentile(x, [75 ,25])
    iqr = q75 - q25
    

    or

    iqr = np.subtract(*np.percentile(x, [75, 25]))
    

    than making two calls to percentile:

    In [8]: x = np.random.rand(1e6)
    
    In [9]: %timeit q75, q25 = np.percentile(x, [75 ,25]); iqr = q75 - q25
    10 loops, best of 3: 24.2 ms per loop
    
    In [10]: %timeit iqr = np.subtract(*np.percentile(x, [75, 25]))
    10 loops, best of 3: 24.2 ms per loop
    
    In [11]: %timeit iqr = np.percentile(x, 75) - np.percentile(x, 25)
    10 loops, best of 3: 33.7 ms per loop
    
    0 讨论(0)
  • 2020-12-08 09:38

    Ignore this if Jaime's answer works for your case. But if not, according to this answer, to find the exact values of 1st and 3rd quartiles, you should consider doing something like:

    samples = sorted([28, 12, 8, 27, 16, 31, 14, 13, 19, 1, 1, 22, 13])
    
    def find_median(sorted_list):
        indices = []
    
        list_size = len(sorted_list)
        median = 0
    
        if list_size % 2 == 0:
            indices.append(int(list_size / 2) - 1)  # -1 because index starts from 0
            indices.append(int(list_size / 2))
    
            median = (sorted_list[indices[0]] + sorted_list[indices[1]]) / 2
            pass
        else:
            indices.append(int(list_size / 2))
    
            median = sorted_list[indices[0]]
            pass
    
        return median, indices
        pass
    
    median, median_indices = find_median(samples)
    Q1, Q1_indices = find_median(samples[:median_indices[0]])
    Q2, Q2_indices = find_median(samples[median_indices[-1] + 1:])
    
    IQR = Q3 - Q1
    
    quartiles = [Q1, median, Q2]
    

    Code taken from the referenced answer.

    0 讨论(0)
  • 2020-12-08 09:44

    There is now an iqr function in scipy.stats. It is available as of scipy 0.18.0. My original intent was to add it to numpy, but it was considered too domain-specific.

    You may be better off just using Jaime's answer, since the scipy code is just an over-complicated version of the same.

    0 讨论(0)
提交回复
热议问题