I have DataFrame:
time_diff avg_trips
0 0.450000 1.0
1 0.483333 1.0
2 0.500000 1.0
3 0.516667 1.0
4 0.533333 2.0
If you want to use raw python rather than numpy or panda, you can use the python stats module to find the median of the upper and lower half of the list:
>>> import statistics as stat
>>> def quartile(data):
data.sort()
half_list = int(len(data)//2)
upper_quartile = stat.median(data[-half_list]
lower_quartile = stat.median(data[:half_list])
print("Lower Quartile: "+str(lower_quartile))
print("Upper Quartile: "+str(upper_quartile))
print("Interquartile Range: "+str(upper_quartile-lower_quartile)
>>> quartile(df.time_diff)
Line 1: import the statistics module under the alias "stat"
Line 2: define the quartile function
Line 3: sort the data into ascending order
Line 4: get the length of half of the list
Line 5: get the median of the lower half of the list
Line 6: get the median of the upper half of the list
Line 7: print the lower quartile
Line 8: print the upper quartile
Line 9: print the interquartile range
Line 10: run the quartile function for the time_diff column of the DataFrame
Coincidentally, this information is captured with the describe
method:
df.time_diff.describe()
count 5.000000
mean 0.496667
std 0.032059
min 0.450000
25% 0.483333
50% 0.500000
75% 0.516667
max 0.533333
Name: time_diff, dtype: float64
Building upon or rather correcting a bit on what Cyrus said....
np.percentile DOES VERY MUCH calculate the values of Q1, median, and Q3. Consider the sorted list below:
s1=[18,45,66,70,76,83,88,90,90,95,95,98]
running np.percentile(s1, [25, 50, 75])
returns the actual values from the list:
[69. 85.5 91.25]
However, the quartiles are Q1=68.0, Median=85.5, Q3=92.5, which is the correct thing to say
What we are missing here is the interpolation parameter of the np.percentile
and related functions. By default the value of this argument is linear. This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points i < j:
linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
lower: i.
higher: j.
nearest: i or j, whichever is nearest.
midpoint: (i + j) / 2.
Thus running np.percentile(s1, [25, 50, 75], interpolation='midpoint')
returns the actual results for the list:
[68. 85.5 92.5]
Using np.percentile
.
q75, q25 = np.percentile(DataFrame, [75,25])
iqr = q75 - q25
Answer from How do you find the IQR in Numpy?