apply

Why does pandas apply calculate twice

不想你离开。 提交于 2019-11-27 04:23:29
I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior? Code: import pandas as pd def mul2(x): print 'hello' return 2*x df = pd.DataFrame({'a': [1,2,0.67,1.34]}) print df.apply(mul2) Output: hello hello 0 2.00 1 4.00 2 1.34 3 2.68 I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when

Sum every nth points

Deadly 提交于 2019-11-27 03:59:38
I have a vector and I need to sum every n numbers and return the results. This is the way I plan on doing it currently. Any better way to do this? v = 1:100 n = 10 sidx = seq.int(from=1, to=length(v), by=n) eidx = c((sidx-1)[2:length(sidx)], length(v)) thesum = sapply(1:length(sidx), function(i) sum(v[sidx[i]:eidx[i]])) This gives: thesum [1] 55 155 255 355 455 555 655 755 855 955 unname(tapply(v, (seq_along(v)-1) %/% n, sum)) # [1] 55 155 255 355 455 555 655 755 855 955 UPDATE: If you want to sum every n consecutive numbers use colSums If you want to sum every nth number use rowSums as per

How to use Pandas groupby apply() without adding an extra index

我们两清 提交于 2019-11-27 03:24:59
问题 I very often want to create a new DataFrame by combining multiple columns of a grouped DataFrame. The apply() function allows me to do that, but it requires that I create an unneeded index: In [359]: df = pandas.DataFrame({'x': 3 * ['a'] + 2 * ['b'], 'y': np.random.normal(size=5), 'z': np.random.normal(size=5)}) In [360]: df Out[360]: x y z 0 a 0.201980 -0.470388 1 a 0.190846 -2.089032 2 a -1.131010 0.227859 3 b -0.263865 -1.906575 4 b -1.335956 -0.722087 In [361]: df.groupby('x').apply

Remove columns from dataframe where ALL values are NA

扶醉桌前 提交于 2019-11-26 23:35:18
I'm having trouble with a data frame and couldn't really resolve that issue myself: The dataframe has arbitrary properties as columns and each row represents one data set . The question is: How to get rid of columns where for ALL rows the value is NA ? Try this: df <- df[,colSums(is.na(df))<nrow(df)] The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create is.na(df) , which will be an object the same size as df . Here are two approaches that are more memory and time efficient An approach using Filter Filter(function(x)!all(is.na(x)), df) and an

js 关于apply和call的理解使用

走远了吗. 提交于 2019-11-26 23:34:02
  关于call和apply,以前也思考良久,很多时候都以为记住了,但是,我太难了。今天我特地写下笔记,希望可以完全掌握这个东西,也希望可以帮助到任何想对学习这个东西的同学。 一.apply函数定义与理解,先从apply函数出发   在MDN上,apply的定义是:     “ apply() 方法调用一个具 有给定 this 值的函 数 ,以及作为一个数组(或 类似数组对象 )提供的参数。”   我的理解是:apply的前面有个含有this的对象,设为A,apply()的参数里,也含有一个含有this的对象设为B。则A.apply(B),表示A代码执行调用了B,B代码照常执行,执行后的结果作为apply的参数,然后apply把这个结果所指代表示的this替换掉A本身的this,接着执行A代码。   比如: 1 var aa = { 2 _name:111, 3 _age:222, 4 _f:function(){ 5 console.log(this) 6 console.log(this._name) 7 } 8 } 9 var cc = { 10 _name:0, 11 _age:0, 12 _f:function(){ 13 console.log(this) 14 console.log(this._name) 15 } 16 } 17 cc._f.apply(aa)/

Last Observation Carried Forward In a data frame? [duplicate]

随声附和 提交于 2019-11-26 22:46:46
This question already has an answer here: Replacing NAs with latest non-NA value 15 answers I wish to implement a "Last Observation Carried Forward" for a data set I am working on which has missing values at the end of it. Here is a simple code to do it (question after it): LOCF <- function(x) { # Last Observation Carried Forward (for a left to right series) LOCF <- max(which(!is.na(x))) # the location of the Last Observation to Carry Forward x[LOCF:length(x)] <- x[LOCF] return(x) } # example: LOCF(c(1,2,3,4,NA,NA)) LOCF(c(1,NA,3,4,NA,NA)) Now this works great for simple vectors. But if I

apply a function over groups of columns

跟風遠走 提交于 2019-11-26 22:44:51
How can I use apply or a related function to create a new data frame that contains the results of the row averages of each pair of columns in a very large data frame? I have an instrument that outputs n replicate measurements on a large number of samples, where each single measurement is a vector (all measurements are the same length vectors). I'd like to calculate the average (and other stats) on all replicate measurements of each sample. This means I need to group n consecutive columns together and do row-wise calculations. For a simple example, with three replicate measurements on two

Results transposed with R apply [duplicate]

二次信任 提交于 2019-11-26 22:08:34
问题 This question already has an answer here : Why apply() returns a transposed xts matrix? (1 answer) Closed 6 years ago . Apologies, I just realised that this has already been answered here. This should be pretty basic but I do not really understand why it is happening. Can someone help? This is the simple code with the example 'data': applyDirichletPrior <- function (row_vector) { row_vector_added <- row_vector + min (row_vector) row_vector_result <- row_vector_added / sum(row_vector_added) }

Why apply() returns a transposed xts matrix?

时光毁灭记忆、已成空白 提交于 2019-11-26 21:02:48
I want to run a function on all periods of an xts matrix. apply() is very fast but the returned matrix has transposed dimensions compared to the original object: > dim(myxts) [1] 7429 48 > myxts.2 = apply(myxts, 1 , function(x) { return(x) }) > dim(myxts.2) [1] 48 7429 > str(myxts) An 'xts' object from 2012-01-03 09:30:00 to 2012-01-30 16:00:00 containing: Data: num [1:7429, 1:48] 4092500 4098500 4091500 4090300 4095200 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:48] "Open" "High" "Low" "Close" ... Indexed by objects of class: [POSIXlt,POSIXt] TZ: xts Attributes: NULL > str

Apply a function to every row of a matrix or a data frame

亡梦爱人 提交于 2019-11-26 19:31:48
Suppose I have a n by 2 matrix and a function that takes a 2-vector as one of its arguments. I would like to apply the function to each row of the matrix and get a n-vector. How to do this in R? For example, I would like to compute the density of a 2D standard Normal distribution on three points: bivariate.density(x = c(0, 0), mu = c(0, 0), sigma = c(1, 1), rho = 0){ exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+x[2]^2/sigma[2]^2-2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2)) } out <- rbind(c(1, 2), c(3, 4), c(5, 6)) How to apply the function to each row of out ?