apply

Calculate row-wise proportions

自闭症网瘾萝莉.ら 提交于 2019-11-26 19:03:53
I have a data frame: x <- data.frame(id = letters[1:3], val0 = 1:3, val1 = 4:6, val2 = 7:9) # id val0 val1 val2 # 1 a 1 4 7 # 2 b 2 5 8 # 3 c 3 6 9 Within each row, I want to calculate the corresponding proportions (ratio) for each value. E.g. for the value in column "val0", I want to calculate row-wise val0 / (val0 + val1 + val2). Desired output: id val0 val1 val2 1 a 0.083 0.33 0.583 2 b 0.133 0.33 0.533 3 c 0.167 0.33 0.5 Can anyone tell me what's the best way to do this? Here it's just three columns, but there can be alot of columns. And another alternative (though this is mostly a pretty

Why is `vapply` safer than `sapply`?

风流意气都作罢 提交于 2019-11-26 18:27:37
The documentation says vapply is similar to sapply , but has a pre-specified type of return value, so it can be safer [...] to use. Could you please elaborate as to why it is generally safer, maybe providing examples? P.S.: I know the answer and I already tend to avoid sapply . I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer. Ari B. Friedman As has already been noted, vapply does two things: Slight speed improvement Improves consistency by providing limited return type checks. The second point is the greater advantage, as it

AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval

时光怂恿深爱的人放手 提交于 2019-11-26 18:00:46
I have a series of the form: s 0 [133, 115, 3, 1] 1 [114, 115, 2, 3] 2 [51, 59, 1, 1] dtype: object Note that its elements are strings : s[0] '[133, 115, 3, 1]' I'm trying to use pd.eval to parse this string into a column of lists. This works for this sample data. pd.eval(s) array([[133, 115, 3, 1], [114, 115, 2, 3], [51, 59, 1, 1]], dtype=object) However, on much larger data (order of 10K), this fails miserably! len(s) 300000 pd.eval(s) AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis' What am I missing here? Is there something wrong with the function or my data? TL

Sorting rows alphabetically

ε祈祈猫儿з 提交于 2019-11-26 17:57:18
My data looks like, A B C D B C A D X Y M Z O M L P How can I sort the rows to get something like A B C D A B C D M X Y Z L M O P Thanks, t(apply(DF, 1, sort)) The t() function is necessary because row operations with the apply family of functions returns the results in column-major order. What did you try? This is really straight-forward and easy to solve with a simple loop. > s <- x > for(i in 1:NROW(x)) { + s[i,] <- sort(s[i,]) + } > s V1 V2 V3 V4 1 A B C D 2 A B C D 3 M X Y Z 4 L M O P No plyr answer yet?! foo <- matrix(sample(LETTERS,10^2,T),10,10) library("plyr") aaply(foo,1,sort)

Why does as.factor return a character when used inside apply?

三世轮回 提交于 2019-11-26 17:40:56
问题 I want to convert variables into factors using apply() : a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) a2 <- apply(a, 2,as.factor) apply(a2, 2,class) results in: x1 x2 x3 "character" "character" "character" I don't understand why this results in character vectors instead of factor vectors. 回答1: apply converts your data.frame to a character matrix. Use lapply : lapply(a, class) # $x1 # [1] "numeric" # $x2 # [1] "factor" #

Is there a R function that applies a function to each pair of columns?

雨燕双飞 提交于 2019-11-26 16:22:39
I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write: df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100)) n <- ncol(df) foo <- matrix(0,n,n) for ( i in 1:n) { for (j in i:n) { foo[i,j] <- cor.test(df[,i],df[,j])$p.value } } foo[lower.tri(foo)] <- t(foo)[lower.tri(foo)] foo [,1] [,2] [,3] [1,] 0.0000000 0.7215071 0.5651266 [2,] 0.7215071 0.0000000 0.9019746 [3,] 0.5651266 0.9019746 0.0000000 which works, but is

how to access global/outer scope variable from R apply function?

假装没事ソ 提交于 2019-11-26 16:13:38
问题 I can't seem to make apply function access/modify a variable that is declared outside... what gives? x = data.frame(age=c(11,12,13), weight=c(100,105,110)) x testme <- function(df) { i <- 0 apply(df, 1, function(x) { age <- x[1] weight <- x[2] cat(sprintf("age=%d, weight=%d\n", age, weight)) i <- i+1 #this could not access the i variable in outer scope z <- z+1 #this could not access the global variable }) cat(sprintf("i=%d\n", i)) i } z <- 0 y <- testme(x) cat(sprintf("y=%d, z=%d\n", y, z))

How to paste a string on each element of a vector of strings using apply in R?

六眼飞鱼酱① 提交于 2019-11-26 15:57:24
问题 I have a vector of strings. d <- c("Mon","Tues","Wednes","Thurs","Fri","Satur","Sun") for which I want to paste the string "day" on each element of the vector in a way similar to this. week <- apply(d, "day", paste, sep='') 回答1: No need for apply() , just use paste() : R> d <- c("Mon","Tues","Wednes","Thurs","Fri","Satur","Sun") R> week <- paste(d, "day", sep="") R> week [1] "Monday" "Tuesday" "Wednesday" "Thursday" [4] "Friday" "Saturday" "Sunday" R> 回答2: Other have already indicated that

Faster way to read fixed-width files

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-26 14:28:52
I work with a lot of fixed width files (i.e., no separating character) that I need to read into R. So, there is usually a definition of the column width to parse the string into variables. I can use read.fwf to read in the data without a problem. However, for large files, this can take a long time. For a recent dataset, this took 800 seconds to read in a dataset with ~500,000 rows and 143 variables. seer9 <- read.fwf("~/data/rawdata.txt", widths = cols, header = FALSE, buffersize = 250000, colClasses = "character", stringsAsFactors = FALSE)) fread in the data.table package in R is awesome for

use multiple columns as variables with sapply

廉价感情. 提交于 2019-11-26 13:02:36
问题 I have a dataframe and I would like to apply a function that takes the values of three columns and computes the minimum difference between the three values. #dataset df <- data.frame(a= sample(1:100, 10),b = sample(1:100, 10),c= sample(1:100, 10)) #function minimum_distance <- function(a,b,c) { dist1 <- abs(a-b) dist2 <- abs(a-c) dist3 <- abs(b-c) return(min(dist1,dist2,dist3)) } I am looking for something like: df$distance <- sapply(df, function(x) minimum_distance(x$a,x$b,x$c) ) ##