percentile

Plot error bars (percentile)

老子叫甜甜 提交于 2020-08-20 07:00:48
问题 I'm quite new to python and I need some help. I would like to plot errorbars equivalent to 1sigma standard deviations on my plot as the 16th and 84th percentile values of the distributions. I tried with (using matplotlib): err=np.std(x) but it just gives me the standard deviations. Thanks. 回答1: If you want vertical error bars ax = plt.gca() ax.errorbar(x, y, yerr=np.vstack([error_low, error_high])) plt.draw() where error_low and error_high are 1D sequences of the same length an x and y . The

Finding Percentile in Spark-Scala per a group

吃可爱长大的小学妹 提交于 2020-06-20 15:34:33
问题 I am trying to do a percentile over a column using a Window function as below. I have referred here to use the ApproxQuantile definition over a group. val df1 = Seq( (1, 10.0), (1, 20.0), (1, 40.6), (1, 15.6), (1, 17.6), (1, 25.6), (1, 39.6), (2, 20.5), (2 ,70.3), (2, 69.4), (2, 74.4), (2, 45.4), (3, 60.6), (3, 80.6), (4, 30.6), (4, 90.6) ).toDF("ID","Count") val idBucketMapping = Seq((1, 4), (2, 3), (3, 2), (4, 2)) .toDF("ID", "Bucket") //jpp import org.apache.spark.sql.Column import org

Finding Percentile in Spark-Scala per a group

孤人 提交于 2020-06-20 15:31:57
问题 I am trying to do a percentile over a column using a Window function as below. I have referred here to use the ApproxQuantile definition over a group. val df1 = Seq( (1, 10.0), (1, 20.0), (1, 40.6), (1, 15.6), (1, 17.6), (1, 25.6), (1, 39.6), (2, 20.5), (2 ,70.3), (2, 69.4), (2, 74.4), (2, 45.4), (3, 60.6), (3, 80.6), (4, 30.6), (4, 90.6) ).toDF("ID","Count") val idBucketMapping = Seq((1, 4), (2, 3), (3, 2), (4, 2)) .toDF("ID", "Bucket") //jpp import org.apache.spark.sql.Column import org

elasticsearch - filter by percentile

与世无争的帅哥 提交于 2020-05-28 03:25:48
问题 Say if I want to filter documents by some field within 10th to 20th percentile. I'm wondering if it's possible by some simple query, something like {"fieldName":{"percentile": [0.1, 0.2]}} . Say I have these documents: [{"a":1,"b":101},{"a":2,"b":102},{"a":3,"b":103}, ..., {"a":100,"b":200}] I need to filter the top 10th of them by a (with ascending order), that would be a from 1 to 10. Then I need to sort those results by b with descending order, then take the paginated result (like page No

elasticsearch - filter by percentile

二次信任 提交于 2020-05-28 03:24:11
问题 Say if I want to filter documents by some field within 10th to 20th percentile. I'm wondering if it's possible by some simple query, something like {"fieldName":{"percentile": [0.1, 0.2]}} . Say I have these documents: [{"a":1,"b":101},{"a":2,"b":102},{"a":3,"b":103}, ..., {"a":100,"b":200}] I need to filter the top 10th of them by a (with ascending order), that would be a from 1 to 10. Then I need to sort those results by b with descending order, then take the paginated result (like page No

elasticsearch - filter by percentile

依然范特西╮ 提交于 2020-05-28 03:23:59
问题 Say if I want to filter documents by some field within 10th to 20th percentile. I'm wondering if it's possible by some simple query, something like {"fieldName":{"percentile": [0.1, 0.2]}} . Say I have these documents: [{"a":1,"b":101},{"a":2,"b":102},{"a":3,"b":103}, ..., {"a":100,"b":200}] I need to filter the top 10th of them by a (with ascending order), that would be a from 1 to 10. Then I need to sort those results by b with descending order, then take the paginated result (like page No

Grouped table of percentiles [duplicate]

本小妞迷上赌 提交于 2020-05-17 06:44:47
问题 This question already has answers here : ddply multiple quantiles by group (4 answers) Closed 4 months ago . I need to calculate which value represents the 5%, 34%, 50%, 67% and 95% percentile within the group (in separate columns). An expected output would be 5% 34% 50% 67% 95% A 4 6 8 12 30 B 1 2 3 4 10 for integer values for each group. The code below shows what I have so far (but using generated data): library(dplyr) library(tidyr) data.frame(group=sample(LETTERS[1:5],100,TRUE),values

How to calculate the percentile?

陌路散爱 提交于 2020-03-18 04:11:25
问题 I have access logs such as below stored in a mongodb instance: Time Service Latency [27/08/2013:11:19:22 +0000] "POST Service A HTTP/1.1" 403 [27/08/2013:11:19:24 +0000] "POST Service B HTTP/1.1" 1022 [27/08/2013:11:22:10 +0000] "POST Service A HTTP/1.1" 455 Is there an analytics function like PERCENTILE_DISC in Oracle to calculate the percentile? I would like to calculate latency percentiles over a period of time. 回答1: There still appears to be no native way to calculate percentiles but by