percentile

Pass percentiles to pandas agg function

匿名 (未验证) 提交于 2019-12-03 01:10:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I want to pass the numpy percentile() function through pandas' agg() function as I do below with various other numpy statistics functions. Right now I have a dataframe that looks like this: AGGREGATE MY_COLUMN A 10 A 12 B 5 B 9 A 84 B 22 And my code looks like this: grouped = dataframe.groupby('AGGREGATE') column = grouped['MY_COLUMN'] column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max]) The above code works, but I want to do something like column.agg([np.sum, np.mean, np.percentile(50), np.percentile(95)]) i.e. specify

Weighted version of scipy percentileofscore

匿名 (未验证) 提交于 2019-12-03 00:45:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'd like to pass weights to scipy.stats.percentileofscore . For example: from scipy import stats a = [1, 2, 3, 4] val = 3 stats.percentileofscore(a, val) Returns 75, as 75% of the values in a lie at or below the val 3. I'd like to add weights, for example: weights = [2, 2, 3, 3] weightedpercentileofscore(a, val, weights) Should return 70, since (2 + 2 + 3) / (2 + 2 + 3 + 3) = 7 / 10 of the weights fall at or below 3. This should also work for decimal weights and large weights, so just expanding the arrays isn't ideal. Weighted percentile

Pandas describe vs scipy.stats percentileofscore with NaN?

瘦欲@ 提交于 2019-12-02 22:45:17
问题 I'm having a weird situation, where pd.describe is giving me percentile markers that disagree with scipy.stats percentileofscore, because of NaNs, I think. My df is: f_recommend 0 3.857143 1 4.500000 2 4.458333 3 NaN 4 3.600000 5 NaN 6 4.285714 7 3.587065 8 4.200000 9 NaN When I run df.describe(percentiles=[.25, .5, .75]) I get: f_recommend count 7.000000 mean 4.069751 std 0.386990 min 3.587065 25% 3.728571 50% 4.200000 75% 4.372024 max 4.500000 I get the same values when I run with NaN

PostgreSQL equivalent of Oracle's PERCENTILE_CONT function

喜夏-厌秋 提交于 2019-12-02 22:32:13
Has anyone found a PostgreSQL equivalent of Oracle's PERCENTILE_CONT function? I searched, and could not find one, so I wrote my own. Here is the solution that I hope helps you out. The company I work for wanted to migrate a Java EE web application from using an Oracle database over to using PostgreSQL. Several stored procedures relied heavily upon using Oracle's unique PERCENTILE_CONT() function. This function does not exist in PostgreSQL. I tried searching to see if anyone had "ported over" that function into PG to no avail. thatdevguy After more searching I found a page that listed the

Pandas describe vs scipy.stats percentileofscore with NaN?

久未见 提交于 2019-12-02 14:12:14
I'm having a weird situation, where pd.describe is giving me percentile markers that disagree with scipy.stats percentileofscore, because of NaNs, I think. My df is: f_recommend 0 3.857143 1 4.500000 2 4.458333 3 NaN 4 3.600000 5 NaN 6 4.285714 7 3.587065 8 4.200000 9 NaN When I run df.describe(percentiles=[.25, .5, .75]) I get: f_recommend count 7.000000 mean 4.069751 std 0.386990 min 3.587065 25% 3.728571 50% 4.200000 75% 4.372024 max 4.500000 I get the same values when I run with NaN removed. When I want to look up a specific value, however, when I run scipy.stats.percentileofscore(df['f

SQL rank percentile

為{幸葍}努か 提交于 2019-12-02 13:59:00
I've made an SQL query which rank pages by how many times they have been viewed. For instance, ╔══════╦═══════╗ ║ PAGE ║ VIEWS ║ ╠══════╬═══════╣ ║ J ║ 100 ║ ║ Q ║ 77 ║ ║ 3 ║ 55 ║ ║ A ║ 23 ║ ║ 2 ║ 6 ║ ╚══════╩═══════╝ Now what I would like to do is find the percentile rank of each page using an SQL query. The math I would like to use for this is simple enough, I just want to take the row number of the already generated table divided by the total number of rows. Or 1 minus this value, depending on my interests. Can I do a COUNT(pages) on an already generated table like this? I realize that's

Color code points based on percentile in ggplot

假装没事ソ 提交于 2019-12-02 07:27:01
问题 I have some very large files that contain a genomic position (position) and a corresponding population genetic statistic (value). I have successfully plotted these values and would like to color code the top 5% (blue) and 1% (red) of values. I am wondering if there is an easy way to do this in R. I have explored writing a function that defines the quantiles, however, many of them end up being not unique and thus cause the function to fail. I've also looked into stat_quantile but only had

Numpy | 17 统计函数

Deadly 提交于 2019-12-02 06:32:47
NumPy 提供了很多统计函数,用于从数组中查找最小元素,最大元素,百分位标准差和方差等。 带轴向就是计算整行或整列的最大值和最小值,不带轴向,就是整个数组的最大值和最小值 numpy.amin() 和 numpy.amax() numpy.amin() 用于计算数组中的元素沿指定轴的最小值。 numpy.amax() 用于计算数组中的元素沿指定轴的最大值。 import numpy as np a = np.array([[3, 7, 5], [8, 4, 3], [2, 4, 9]]) print('我们的数组是:') print(a) print('\n') print('调用 amin() 函数:') print(np.amin(a, 1)) print('\n') print('再次调用 amin() 函数:') print(np.amin(a, 0)) print('\n') print('调用 amax() 函数:') print(np.amax(a, 1)) print('\n') print('再次调用 amax() 函数:') print(np.amax(a, axis=0)) print('\n') print('整个数组中最大和最小值为:') print(np.amin(a),np.amax(a)) 输出结果为: 我们的数组是: [[3 7 5] [8 4

Select nth percentile from MySQL

左心房为你撑大大i 提交于 2019-12-01 21:02:52
问题 I have a simple table of data, and I'd like to select the row that's at about the 40th percentile from the query. I can do this right now by first querying to find the number of rows and then running another query that sorts and selects the nth row: select count(*) as `total` from mydata; which may return something like 93, 93*0.4 = 37 select * from mydata order by `field` asc limit 37,1; Can I combine these two queries into a single query? 回答1: This will give you approximately the 40th

Definitive way to match Stata weighted xtile command using Python?

↘锁芯ラ 提交于 2019-12-01 18:55:07
问题 For a project, I need to replicate some results that currently exist in Stata output files (.dta) and were computed from an older Stata script. The new version of the project needs to be written in Python. The specific part I am having difficulty with is matching quantile breakpoint calculations based on the weighted version of Stata's xtile command. Note that ties between data points won't matter with the weights, and the weights I am using come from a continuous quantity, so ties are