median

Sliding window function in R

倾然丶 夕夏残阳落幕 提交于 2019-11-29 12:06:59
Does somebody know whether there is a sliding window method in R for 2d matrices and not just vectors. I need to apply median function to an image stored in matrix The function focal() in the excellent raster package is good for this. It takes several arguments beyond those shown in the example below, and can be used to specify a non-rectangular sliding window if that's needed. library(raster) ## Create some example data m <- matrix(1, ncol=10, nrow=10) diag(m) <- 2 r <- as(m, "RasterLayer") # Coerce matrix to RasterLayer object ## Apply a function that returns a single value when passed

how to calculate running median efficiently

泪湿孤枕 提交于 2019-11-29 12:00:16
I borrowed some code trying to implement a function to calculate the running median for a ton of data. The current one is too slow for me ( The tricky part is that I need to exclude all zeros from the running box ). Below is the code: from itertools import islice from collections import deque from bisect import bisect_left,insort def median(s): sp = [nz for nz in s if nz!=0] print sp Mnow = len(sp) if Mnow == 0: return 0 else: return np.median(sp) def RunningMedian(seq, M): seq = iter(seq) s = [] # Set up list s (to be sorted) and load deque with first window of seq s = [item for item in

Number of comparisons made in median of 3 function?

天涯浪子 提交于 2019-11-29 10:01:19
As of right now, my functioin finds the median of 3 numbers and sorts them, but it always makes three comparisons. I'm thinking I can use a nested if statement somewhere so that sometimes my function will only make two comparisons. int median_of_3(int list[], int p, int r) { int median = (p + r) / 2; if(list[p] > list[r]) exchange(list, p, r); if(list[p] > list[median]) exchange(list, p, median); if(list[r] > list[median]) exchange(list, r, median); comparisons+=3; // 3 comparisons for each call to median_of_3 return list[r]; } I'm not sure I see where I can make that nested if statement. If

How to find the median in Apache Spark with Python Dataframe API?

为君一笑 提交于 2019-11-29 08:51:40
Pyspark API provides many aggregate functions except the median. Spark 2 comes with approxQuantile which gives approximate quantiles but exact median is very expensive to calculate. Is there a more Pyspark way of calculating median for a column of values in a Spark Dataframe? gench Here is an example implementation with Dataframe API in Python (Spark 1.6 +). import pyspark.sql.functions as F import numpy as np from pyspark.sql.types import FloatType Let's assume we have monthly salaries for customers in "salaries" spark dataframe such as: month | customer_id | salary and we would like to find

how to calculate the median on grouped dataset?

六眼飞鱼酱① 提交于 2019-11-29 06:44:41
My dataset is as following: salary number 1500-1600 110 1600-1700 180 1700-1800 320 1800-1900 460 1900-2000 850 2000-2100 250 2100-2200 130 2200-2300 70 2300-2400 20 2400-2500 10 How can I calculate the median of this dataset? Here's what I have tried: x <- c(110, 180, 320, 460, 850, 250, 130, 70, 20, 10) colnames <- "numbers" rownames <- c("[1500-1600]", "(1600-1700]", "(1700-1800]", "(1800-1900]", "(1900-2000]", "(2000,2100]", "(2100-2200]", "(2200-2300]", "(2300-2400]", "(2400-2500]") y <- matrix(x, nrow=length(x), dimnames=list(rownames, colnames)) data.frame(y, "cumsum"=cumsum(y)) numbers

Add hline with population median for each facet

浪子不回头ぞ 提交于 2019-11-29 06:34:24
I'd like to plot a horizontal facet-wide line with the population median of that facet. I tried the approach without creating a dummy summary table with the following code: require(ggplot2) dt = data.frame(gr = rep(1:2, each = 500), id = rep(1:5, 2, each = 100), y = c(rnorm(500, mean = 0, sd = 1), rnorm(500, mean = 1, sd = 2))) ggplot(dt, aes(x = as.factor(id), y = y)) + geom_boxplot() + facet_wrap(~ gr) + geom_hline(aes(yintercept = median(y), group = gr), colour = 'red') However, the line is drawn for the median of the entire dataset instead of the median separately for each facet: In the

median of two sorted arrays

时光怂恿深爱的人放手 提交于 2019-11-29 04:04:04
My question is with reference to Method 2 of this link. Here two equal length sorted arrays are given and we have to find the median of the two arrays merged. Algorithm: 1) Calculate the medians m1 and m2 of the input arrays ar1[] and ar2[] respectively. 2) If m1 and m2 both are equal then we are done. return m1 (or m2) 3) If m1 is greater than m2, then median is present in one of the below two subarrays. a) From first element of ar1 to m1 (ar1[0...|_n/2_|]) b) From m2 to last element of ar2 (ar2[|_n/2_|...n-1]) 4) If m2 is greater than m1, then median is present in one of the below two

Optimal median of medians selection - 3 element blocks vs 5 element blocks?

为君一笑 提交于 2019-11-28 23:42:08
I'm working on a quicksort-variant implementation based on the Select algorithm for choosing a good pivot element. Conventional wisdom seems to be to divide the array into 5-element blocks, take the median of each, and then recursively apply the same blocking approach to the resulting medians to get a "median of medians". What's confusing me is the choice of 5-element blocks rather than 3-element blocks. With 5-element blocks, it seems to me that you perform n/4 = n/5 + n/25 + n/125 + n/625 + ... median-of-5 operations, whereas with 3-element blocks, you perform n/2 = n/3 + n/9 + n/27 + n/81 +

Computing median in map reduce

纵然是瞬间 提交于 2019-11-28 21:28:05
Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is responsible for sorting all the data from n mappers and finding the median(middle value) Is my understanding correct?, if so, does this approach scale for massive amounts of data as i can clearly see the one single reducer struggling to do the final task. Thanks Chris White Trying to find the median (middle number) in a series is going to require that 1 reducer is passed the entire range of numbers to

How to find Median [duplicate]

≯℡__Kan透↙ 提交于 2019-11-28 21:23:10
This question already has an answer here: Finding median of list in Python 19 answers I have data like this. Ram,500 Sam,400 Test,100 Ram,800 Sam,700 Test,300 Ram,900 Sam,800 Test,400 What is the shortest way to fine the "median" from above data. My result should be something like... Median = 1/2(n+1), where n is the number of data values in the sample. Test 500 Sam 700 Ram 800 Python 3.4 includes statistics built-in, so you can use the method statistics.median : >>> from statistics import median >>> median([1, 3, 5]) 3 Use numpy's median function. Its a little unclear how your data is