percentile

Select nth percentile from MySQL

帅比萌擦擦* 提交于 2019-12-01 18:38:06
I have a simple table of data, and I'd like to select the row that's at about the 40th percentile from the query. I can do this right now by first querying to find the number of rows and then running another query that sorts and selects the nth row: select count(*) as `total` from mydata; which may return something like 93, 93*0.4 = 37 select * from mydata order by `field` asc limit 37,1; Can I combine these two queries into a single query? This will give you approximately the 40th percentile, it returns the row where 40% of rows are less than it. It sorts rows by how far they are from the

Definitive way to match Stata weighted xtile command using Python?

冷暖自知 提交于 2019-12-01 18:14:10
For a project, I need to replicate some results that currently exist in Stata output files (.dta) and were computed from an older Stata script. The new version of the project needs to be written in Python. The specific part I am having difficulty with is matching quantile breakpoint calculations based on the weighted version of Stata's xtile command . Note that ties between data points won't matter with the weights, and the weights I am using come from a continuous quantity, so ties are extremely unlikely (and there are no ties in my test data set). So miscategorizing due to ties is not it. I

How to compute 99% coverage in MATLAB?

浪子不回头ぞ 提交于 2019-12-01 15:46:45
I have a matrix in MATLAB and I need to find the 99% value for each column. In other words, the value such that 99% of the population has a larger value than it. Is there a function in MATLAB for this? Use QUANTILE function. Y = quantile(X,P); where X is a matrix and P is scalar or vector of probabilities. For example, if P=0.01, the Y will be vector of values for each columns, so that 99% of column values are larger. gnovice The simplest solution is to use the function QUANTILE as yuk suggested . Y = quantile(X,0.01); However, you will need the Statistics Toolbox to use the function QUANTILE

How to compute 99% coverage in MATLAB?

流过昼夜 提交于 2019-12-01 14:38:05
问题 I have a matrix in MATLAB and I need to find the 99% value for each column. In other words, the value such that 99% of the population has a larger value than it. Is there a function in MATLAB for this? 回答1: Use QUANTILE function. Y = quantile(X,P); where X is a matrix and P is scalar or vector of probabilities. For example, if P=0.01, the Y will be vector of values for each columns, so that 99% of column values are larger. 回答2: The simplest solution is to use the function QUANTILE as yuk

Quartiles in SQL query

主宰稳场 提交于 2019-12-01 05:53:34
问题 I have a very simple table like that: CREATE TABLE IF NOT EXISTS LuxLog ( Sensor TINYINT, Lux INT, PRIMARY KEY(Sensor) ) It contains thousands of logs from different sensors. I would like to have Q1 and Q3 for all sensors. I can do one query for every data, but it would be better for me to have one query for all sensors (getting Q1 and Q3 back from one query) I though it would be a fairly simple operation, as quartiles are broadly used and one of the main statistical variables in frequency

Percentile aggregate for SQL Server 2008 R2

扶醉桌前 提交于 2019-12-01 01:43:52
I'm using SQL Server 2008 R2. I need to compute a percentile value per group, something like: SELECT id, PCTL(0.9, x) -- for the 90th percentile FROM my_table GROUP BY id ORDER BY id For example, given this DDL ( fiddle ) --- CREATE TABLE my_table (id INT, x REAL); INSERT INTO my_table VALUES (7, 0.164595), (5, 0.671311), (7, 0.0118385), (6, 0.704592), (3, 0.633521), (3, 0.337268), (0, 0.54739), (6, 0.312282), (0, 0.220618), (7, 0.214973), (6, 0.410768), (7, 0.151572), (7, 0.0639506), (5, 0.339075), (1, 0.284094), (2, 0.126722), (2, 0.870079), (3, 0.369366), (1, 0.6687), (5, 0.199456), (5, 0

Is it possible to draw a matplotlib boxplot given the percentile values instead of the original inputs?

天大地大妈咪最大 提交于 2019-11-30 12:46:23
From what I can see, boxplot() method expects a sequence of raw values (numbers) as input, from which it then computes percentiles to draw the boxplot(s). I would like to have a method by which I could pass in the percentiles and get the corresponding boxplot . For example: Assume that I have run several benchmarks and for each benchmark I've measured latencies ( floating point values ). Now additionally, I have precomputed the percentiles for these values. Hence for each benchmark, I have the 25th, 50th, 75th percentile along with the min and max. Now given these data, I would like to draw

Percentile calculation

孤人 提交于 2019-11-30 08:04:00
I want to mimic the Excel equivalent PERCENTILE function in C# (or in some pseudo code). How can I do that? The function should take two arguments where the first is a list of values and the second is for what percentile the function should calculate for. Tanks! Edit: I'm sorry if my question came across like I had not tried it my self. I just couldn't understand how the excel function worked (yes, I tried wikipedia and wolfram first) and I thought I would understand it better if someone presented it in code. @CodeInChaos gave an answer that seem to be what I'm after. I think Wikipedia page

Python: Matplotlib - probability plot for several data set

不打扰是莪最后的温柔 提交于 2019-11-30 07:37:49
I have several data sets (distribution) as follows: set1 = [1,2,3,4,5] set2 = [3,4,5,6,7] set3 = [1,3,4,5,8] How do I plot a scatter plot with the data sets above with the y-axis being the probability (i.e. the percentile of the distribution in set: 0%-100% ) and the x-axis being the data set names? in JMP, it is called 'Quantile Plot'. Something like image attached: Please educate. Thanks. [EDIT] My data is in csv as such: Using JMP analysis tool, I'm able to plot the probability distribution plot (QQ-plot/Normal Quantile Plot as figure far below): I believe Joe Kington almost has my problem

Calculate percentile for every value in a column of dataframe

天大地大妈咪最大 提交于 2019-11-29 11:40:15
I am trying to calculate percentile for every value in column a from a DataFrame x . Is there a better way to write the following piece of code? x["pcta"] = [stats.percentileofscore(x["a"].values, i) for i in x["a"].values] I would like to see better performance. It seems like you want Series.rank() : x.loc[:, 'pcta'] = x.rank(pct=True) # will be in decimal form Performance: import scipy.stats as scs %timeit [scs.percentileofscore(x["a"].values, i) for i in x["a"].values] 1000 loops, best of 3: 877 µs per loop %timeit x.rank(pct=True) 10000 loops, best of 3: 107 µs per loop 来源: https:/