How to calculate 1st and 3rd quartiles?

前端 未结 10 1142
眼角桃花
眼角桃花 2020-12-04 19:14

I have DataFrame:

    time_diff   avg_trips
0   0.450000    1.0
1   0.483333    1.0
2   0.500000    1.0
3   0.516667    1.0
4   0.533333    2.0
相关标签:
10条回答
  • 2020-12-04 19:57

    If you want to use raw python rather than numpy or panda, you can use the python stats module to find the median of the upper and lower half of the list:

        >>> import statistics as stat
        >>> def quartile(data):
                data.sort()               
                half_list = int(len(data)//2)
                upper_quartile = stat.median(data[-half_list]
                lower_quartile = stat.median(data[:half_list])
                print("Lower Quartile: "+str(lower_quartile))
                print("Upper Quartile: "+str(upper_quartile))
                print("Interquartile Range: "+str(upper_quartile-lower_quartile)
    
        >>> quartile(df.time_diff)
    

    Line 1: import the statistics module under the alias "stat"

    Line 2: define the quartile function

    Line 3: sort the data into ascending order

    Line 4: get the length of half of the list

    Line 5: get the median of the lower half of the list

    Line 6: get the median of the upper half of the list

    Line 7: print the lower quartile

    Line 8: print the upper quartile

    Line 9: print the interquartile range

    Line 10: run the quartile function for the time_diff column of the DataFrame

    0 讨论(0)
  • 2020-12-04 19:59

    Coincidentally, this information is captured with the describe method:

    df.time_diff.describe()
    
    count    5.000000
    mean     0.496667
    std      0.032059
    min      0.450000
    25%      0.483333
    50%      0.500000
    75%      0.516667
    max      0.533333
    Name: time_diff, dtype: float64
    
    0 讨论(0)
  • 2020-12-04 20:00

    Building upon or rather correcting a bit on what Cyrus said....

    np.percentile DOES VERY MUCH calculate the values of Q1, median, and Q3. Consider the sorted list below:

    s1=[18,45,66,70,76,83,88,90,90,95,95,98]
    

    running np.percentile(s1, [25, 50, 75]) returns the actual values from the list:

    [69.  85.5  91.25]
    

    However, the quartiles are Q1=68.0, Median=85.5, Q3=92.5, which is the correct thing to say

    What we are missing here is the interpolation parameter of the np.percentile and related functions. By default the value of this argument is linear. This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points i < j:
    linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
    lower: i.
    higher: j.
    nearest: i or j, whichever is nearest.
    midpoint: (i + j) / 2.

    Thus running np.percentile(s1, [25, 50, 75], interpolation='midpoint') returns the actual results for the list:

    [68. 85.5 92.5]
    
    0 讨论(0)
  • 2020-12-04 20:03

    Using np.percentile.

    q75, q25 = np.percentile(DataFrame, [75,25])
    iqr = q75 - q25
    

    Answer from How do you find the IQR in Numpy?

    0 讨论(0)
提交回复
热议问题