How to calculate 1st and 3rd quartiles?

前端未结

关注

 10  1152

I have DataFrame:

    time_diff   avg_trips
0   0.450000    1.0
1   0.483333    1.0
2   0.500000    1.0
3   0.516667    1.0
4   0.533333    2.0

相关标签:

10条回答

旧时难觅i

2020-12-04 19:57
If you want to use raw python rather than numpy or panda, you can use the python stats module to find the median of the upper and lower half of the list:
```
    >>> import statistics as stat
    >>> def quartile(data):
            data.sort()               
            half_list = int(len(data)//2)
            upper_quartile = stat.median(data[-half_list]
            lower_quartile = stat.median(data[:half_list])
            print("Lower Quartile: "+str(lower_quartile))
            print("Upper Quartile: "+str(upper_quartile))
            print("Interquartile Range: "+str(upper_quartile-lower_quartile)

    >>> quartile(df.time_diff)
```
Line 1: import the statistics module under the alias "stat"

Line 2: define the quartile function

Line 3: sort the data into ascending order

Line 4: get the length of half of the list

Line 5: get the median of the lower half of the list

Line 6: get the median of the upper half of the list

Line 7: print the lower quartile

Line 8: print the upper quartile

Line 9: print the interquartile range

Line 10: run the quartile function for the time_diff column of the DataFrame
0 讨论(0)
发布评论:

提交评论
- 加载中...

生来不讨喜

2020-12-04 19:59

Coincidentally, this information is captured with the describe method:

df.time_diff.describe()

count    5.000000
mean     0.496667
std      0.032059
min      0.450000
25%      0.483333
50%      0.500000
75%      0.516667
max      0.533333
Name: time_diff, dtype: float64

0 讨论(0)

礼貌的吻别

2020-12-04 20:00
Building upon or rather correcting a bit on what Cyrus said....

np.percentile DOES VERY MUCH calculate the values of Q1, median, and Q3. Consider the sorted list below:
```
s1=[18,45,66,70,76,83,88,90,90,95,95,98]
```
running np.percentile(s1, [25, 50, 75]) returns the actual values from the list:
```
[69.  85.5  91.25]
```
However, the quartiles are Q1=68.0, Median=85.5, Q3=92.5, which is the correct thing to say

What we are missing here is the interpolation parameter of the np.percentile and related functions. By default the value of this argument is linear. This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points i < j:
linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
lower: i.
higher: j.
nearest: i or j, whichever is nearest.
midpoint: (i + j) / 2.

Thus running np.percentile(s1, [25, 50, 75], interpolation='midpoint') returns the actual results for the list:
```
[68. 85.5 92.5]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-12-04 20:03
Using np.percentile.
```
q75, q25 = np.percentile(DataFrame, [75,25])
iqr = q75 - q25
```
Answer from How do you find the IQR in Numpy?
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2