percentile

Percentile calculation

爱⌒轻易说出口 提交于 2019-11-29 11:00:34
问题 I want to mimic the Excel equivalent PERCENTILE function in C# (or in some pseudo code). How can I do that? The function should take two arguments where the first is a list of values and the second is for what percentile the function should calculate for. Tanks! Edit: I'm sorry if my question came across like I had not tried it my self. I just couldn't understand how the excel function worked (yes, I tried wikipedia and wolfram first) and I thought I would understand it better if someone

nth percentile calculations in postgresql

隐身守侯 提交于 2019-11-29 05:30:55
I've been surprisingly unable to find an nth percentile function for postgresql. I am using this via mondrian olap tool so i just need an aggregate function which returns a 95th percentile. I did find this link: http://www.postgresql.org/message-id/162867790907102334r71db0227jfa0e4bd96f48b8e4@mail.gmail.com But for some reason the code in that percentile function is returning nulls in some cases with certain queries. I've checked the data and there's nothing odd in the data that would seem to cause that! alfonx With PostgreSQL 9.4 there is native support for percentiles now, implemented in

Calculating percentile of dataset column

拟墨画扇 提交于 2019-11-28 19:10:38
A quick one for you, dearest R gurus: I'm doing an assignment and I've been asked, in this exercise, to get basic statistics out of the infert dataset (it's in-built), and specifically one of its columns, infert$age . For anyone not familiar with the dataset: > table_ages # Which is just subset(infert, select=c("age")); age 1 26 2 42 3 39 4 34 5 35 6 36 7 23 8 32 9 21 10 28 11 29 ... 246 35 247 29 248 23 I've had to find median values of the column, variance, skewness, standard deviation which were all okay, until I was asked to find the column "percentiles" . I haven't been able to find

Fast Algorithm for computing percentiles to remove outliers

懵懂的女人 提交于 2019-11-28 18:49:43
I have a program that needs to repeatedly compute the approximate percentile (order statistic) of a dataset in order to remove outliers before further processing. I'm currently doing so by sorting the array of values and picking the appropriate element; this is doable, but it's a noticable blip on the profiles despite being a fairly minor part of the program. More info: The data set contains on the order of up to 100000 floating point numbers, and assumed to be "reasonably" distributed - there are unlikely to be duplicates nor huge spikes in density near particular values; and if for some odd

Fast algorithm for repeated calculation of percentile?

与世无争的帅哥 提交于 2019-11-28 16:32:44
In an algorithm I have to calculate the 75th percentile of a data set whenever I add a value. Right now I am doing this: Get value x Insert x in an already sorted array at the back swap x down until the array is sorted Read the element at position array[array.size * 3/4] Point 3 is O(n), and the rest is O(1), but this is still quite slow, especially if the array gets larger. Is there any way to optimize this? UPDATE Thanks Nikita! Since I am using C++ this is the solution easiest to implement. Here is the code: template<class T> class IterativePercentile { public: /// Percentile has to be in

Calculate percentile for every value in a column of dataframe

孤街醉人 提交于 2019-11-28 05:29:25
问题 I am trying to calculate percentile for every value in column a from a DataFrame x . Is there a better way to write the following piece of code? x["pcta"] = [stats.percentileofscore(x["a"].values, i) for i in x["a"].values] I would like to see better performance. 回答1: It seems like you want Series.rank(): x.loc[:, 'pcta'] = x.rank(pct=True) # will be in decimal form Performance: import scipy.stats as scs %timeit [scs.percentileofscore(x["a"].values, i) for i in x["a"].values] 1000 loops, best

matplotlib: disregard outliers when plotting

徘徊边缘 提交于 2019-11-28 04:22:52
I'm plotting some data from various tests. Sometimes in a test I happen to have one outlier (say 0.1), while all other values are three orders of magnitude smaller. With matplotlib, I plot against the range [0, max_data_value] How can I just zoom into my data and not display outliers, which would mess up the x-axis in my plot? Should I simply take the 95 percentile and have the range [0, 95_percentile] on the x-axis? There's no single "best" test for an outlier. Ideally, you should incorporate a-priori information (e.g. "This parameter shouldn't be over x because of blah..."). Most tests for

nth percentile calculations in postgresql

末鹿安然 提交于 2019-11-27 23:04:11
问题 I've been surprisingly unable to find an nth percentile function for postgresql. I am using this via mondrian olap tool so i just need an aggregate function which returns a 95th percentile. I did find this link: http://www.postgresql.org/message-id/162867790907102334r71db0227jfa0e4bd96f48b8e4@mail.gmail.com But for some reason the code in that percentile function is returning nulls in some cases with certain queries. I've checked the data and there's nothing odd in the data that would seem to

Weighted percentile using numpy

梦想的初衷 提交于 2019-11-27 19:06:56
Is there a way to use the numpy.percentile function to compute weighted percentile? Or is anyone aware of an alternative python function to compute weighted percentile? thanks! Unfortunately, numpy doesn't have built-in weighted functions for everything, but, you can always put something together. def weight_array(ar, weights): zipped = zip(ar, weights) weighted = [] for i in zipped: for j in range(i[1]): weighted.append(i[0]) return weighted np.percentile(weight_array(ar, weights), 25) Alleo Completely vectorized numpy solution Here is the code I'm using. It's not an optimal one (which I'm

Is it possible to draw a matplotlib boxplot given the percentile values instead of the original inputs?

有些话、适合烂在心里 提交于 2019-11-27 17:53:11
问题 From what I can see, boxplot() method expects a sequence of raw values (numbers) as input, from which it then computes percentiles to draw the boxplot(s). I would like to have a method by which I could pass in the percentiles and get the corresponding boxplot . For example: Assume that I have run several benchmarks and for each benchmark I've measured latencies ( floating point values ). Now additionally, I have precomputed the percentiles for these values. Hence for each benchmark, I have