percentile

Rank Pandas dataframe by quantile

Deadly 提交于 2020-01-02 12:05:01
问题 I have a Pandas dataframe in which each column represents a separate property, and each row holds the properties' value on a specific date: import pandas as pd dfstr = \ ''' AC BO C CCM CL CRD CT DA GC GF 2010-01-19 0.844135 -0.194530 -0.231046 0.245615 -0.581238 -0.593562 0.057288 0.655903 0.823997 0.221920 2010-01-20 -0.204845 -0.225876 0.835611 -0.594950 -0.607364 0.042603 0.639168 0.816524 0.210653 0.237833 2010-01-21 0.824852 -0.216449 -0.220136 0.234343 -0.611756 -0.624060 0.028295 0

How do I get the percentile for a row in a pandas dataframe?

空扰寡人 提交于 2020-01-01 09:14:32
问题 Example DataFrame Values - 0 78 1 38 2 42 3 48 4 31 5 89 6 94 7 102 8 122 9 122 stats.percentileofscore(temp['INCOME'].values, 38, kind='mean') 15.0 stats.percentileofscore(temp['INCOME'].values, 38, kind='strict') 10.0 stats.percentileofscore(temp['INCOME'].values, 38, kind='weak') 20.0 stats.percentileofscore(temp['INCOME'].values, 38, kind='rank') 20.0 temp['INCOME'].rank(pct=True) 1 0.20 (Only showing the 38 value index) temp['INCOME'].quantile(0.11) 37.93 temp['INCOME'].quantile(0.12) 38

Calculate Percentile Value using MySQL

荒凉一梦 提交于 2019-12-24 00:12:52
问题 I have a table which contains thousands of rows and I would like to calculate the 90th percentile for one of the fields, called 'round'. For example, select the value of round which is at the 90th percentile. I don't see a straightforward way to do this in MySQL. Can somebody provide some suggestions as to how I may start this sort of calculation? Thank you! 回答1: First, lets assume that you have a table with a value column. You want to get the row with 95th percentile value. In other words,

Fast percentile in C++

房东的猫 提交于 2019-12-22 18:59:09
问题 My program calculates a Monte Carlo simulation for the value-at-risk metric. To simplify as much as possible, I have: 1/ simulated daily cashflows 2/ to get a sample of a possible 1-year cashflow, I need to draw 365 random daily cashflows and sum them Hence, the daily cashflows are an empirically given distrobution function to be sampled 365 times. For this, I 1/ sort the daily cashflows into an array called *this->distro* 2/ calculate 365 percentiles corresponding to random probabilities I

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

不羁的心 提交于 2019-12-21 10:46:24
问题 I would like to plot an EPSgram (see below) using Python and Matplotlib. The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes? 回答1: I put together a sample, if you're still curious. It uses scipy.stats.scoreatpercentile, but you may be getting those numbers from elsewhere: from random import random import numpy as np import matplotlib.pyplot as plt from scipy.stats import scoreatpercentile x = np.array([random() for x in xrange(100)]) # percentiles

How should the interquartile range be calculated in Python?

时间秒杀一切 提交于 2019-12-21 05:04:11
问题 I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is. My attempt in Python is as follows: >>> a = numpy

Percentile aggregate for SQL Server 2008 R2

不打扰是莪最后的温柔 提交于 2019-12-19 04:55:21
问题 I'm using SQL Server 2008 R2. I need to compute a percentile value per group, something like: SELECT id, PCTL(0.9, x) -- for the 90th percentile FROM my_table GROUP BY id ORDER BY id For example, given this DDL (fiddle) --- CREATE TABLE my_table (id INT, x REAL); INSERT INTO my_table VALUES (7, 0.164595), (5, 0.671311), (7, 0.0118385), (6, 0.704592), (3, 0.633521), (3, 0.337268), (0, 0.54739), (6, 0.312282), (0, 0.220618), (7, 0.214973), (6, 0.410768), (7, 0.151572), (7, 0.0639506), (5, 0

Python: Matplotlib - probability plot for several data set

十年热恋 提交于 2019-12-18 12:39:14
问题 I have several data sets (distribution) as follows: set1 = [1,2,3,4,5] set2 = [3,4,5,6,7] set3 = [1,3,4,5,8] How do I plot a scatter plot with the data sets above with the y-axis being the probability (i.e. the percentile of the distribution in set: 0%-100% ) and the x-axis being the data set names? in JMP, it is called 'Quantile Plot'. Something like image attached: Please educate. Thanks. [EDIT] My data is in csv as such: Using JMP analysis tool, I'm able to plot the probability

Fast Algorithm for computing percentiles to remove outliers

天大地大妈咪最大 提交于 2019-12-17 22:36:03
问题 I have a program that needs to repeatedly compute the approximate percentile (order statistic) of a dataset in order to remove outliers before further processing. I'm currently doing so by sorting the array of values and picking the appropriate element; this is doable, but it's a noticable blip on the profiles despite being a fairly minor part of the program. More info: The data set contains on the order of up to 100000 floating point numbers, and assumed to be "reasonably" distributed -

Fast algorithm for repeated calculation of percentile?

杀马特。学长 韩版系。学妹 提交于 2019-12-17 21:45:13
问题 In an algorithm I have to calculate the 75th percentile of a data set whenever I add a value. Right now I am doing this: Get value x Insert x in an already sorted array at the back swap x down until the array is sorted Read the element at position array[array.size * 3/4] Point 3 is O(n), and the rest is O(1), but this is still quite slow, especially if the array gets larger. Is there any way to optimize this? UPDATE Thanks Nikita! Since I am using C++ this is the solution easiest to implement