apply | 易学教程

Calculate row-wise proportions

阅读更多关于 Calculate row-wise proportions

I have a data frame: x <- data.frame(id = letters[1:3], val0 = 1:3, val1 = 4:6, val2 = 7:9) # id val0 val1 val2 # 1 a 1 4 7 # 2 b 2 5 8 # 3 c 3 6 9 Within each row, I want to calculate the corresponding proportions (ratio) for each value. E.g. for the value in column "val0", I want to calculate row-wise val0 / (val0 + val1 + val2). Desired output: id val0 val1 val2 1 a 0.083 0.33 0.583 2 b 0.133 0.33 0.533 3 c 0.167 0.33 0.5 Can anyone tell me what's the best way to do this? Here it's just three columns, but there can be alot of columns. And another alternative (though this is mostly a pretty

Why is `vapply` safer than `sapply`?

阅读更多关于 Why is `vapply` safer than `sapply`?

The documentation says vapply is similar to sapply , but has a pre-specified type of return value, so it can be safer [...] to use. Could you please elaborate as to why it is generally safer, maybe providing examples? P.S.: I know the answer and I already tend to avoid sapply . I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer. Ari B. Friedman As has already been noted, vapply does two things: Slight speed improvement Improves consistency by providing limited return type checks. The second point is the greater advantage, as it

AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval

阅读更多关于 AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval

I have a series of the form: s 0 [133, 115, 3, 1] 1 [114, 115, 2, 3] 2 [51, 59, 1, 1] dtype: object Note that its elements are strings : s[0] '[133, 115, 3, 1]' I'm trying to use pd.eval to parse this string into a column of lists. This works for this sample data. pd.eval(s) array([[133, 115, 3, 1], [114, 115, 2, 3], [51, 59, 1, 1]], dtype=object) However, on much larger data (order of 10K), this fails miserably! len(s) 300000 pd.eval(s) AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis' What am I missing here? Is there something wrong with the function or my data? TL

Sorting rows alphabetically

阅读更多关于 Sorting rows alphabetically

My data looks like, A B C D B C A D X Y M Z O M L P How can I sort the rows to get something like A B C D A B C D M X Y Z L M O P Thanks, t(apply(DF, 1, sort)) The t() function is necessary because row operations with the apply family of functions returns the results in column-major order. What did you try? This is really straight-forward and easy to solve with a simple loop. > s <- x > for(i in 1:NROW(x)) { + s[i,] <- sort(s[i,]) + } > s V1 V2 V3 V4 1 A B C D 2 A B C D 3 M X Y Z 4 L M O P No plyr answer yet?! foo <- matrix(sample(LETTERS,10^2,T),10,10) library("plyr") aaply(foo,1,sort)

Why does as.factor return a character when used inside apply?

阅读更多关于 Why does as.factor return a character when used inside apply?

问题 I want to convert variables into factors using apply() : a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) a2 <- apply(a, 2,as.factor) apply(a2, 2,class) results in: x1 x2 x3 "character" "character" "character" I don't understand why this results in character vectors instead of factor vectors. 回答1: apply converts your data.frame to a character matrix. Use lapply : lapply(a, class) # $x1 # [1] "numeric" # $x2 # [1] "factor" #

Is there a R function that applies a function to each pair of columns?

阅读更多关于 Is there a R function that applies a function to each pair of columns?

I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write: df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100)) n <- ncol(df) foo <- matrix(0,n,n) for ( i in 1:n) { for (j in i:n) { foo[i,j] <- cor.test(df[,i],df[,j])$p.value } } foo[lower.tri(foo)] <- t(foo)[lower.tri(foo)] foo [,1] [,2] [,3] [1,] 0.0000000 0.7215071 0.5651266 [2,] 0.7215071 0.0000000 0.9019746 [3,] 0.5651266 0.9019746 0.0000000 which works, but is

how to access global/outer scope variable from R apply function?

阅读更多关于 how to access global/outer scope variable from R apply function?

问题 I can't seem to make apply function access/modify a variable that is declared outside... what gives? x = data.frame(age=c(11,12,13), weight=c(100,105,110)) x testme <- function(df) { i <- 0 apply(df, 1, function(x) { age <- x[1] weight <- x[2] cat(sprintf("age=%d, weight=%d\n", age, weight)) i <- i+1 #this could not access the i variable in outer scope z <- z+1 #this could not access the global variable }) cat(sprintf("i=%d\n", i)) i } z <- 0 y <- testme(x) cat(sprintf("y=%d, z=%d\n", y, z))

How to paste a string on each element of a vector of strings using apply in R?

阅读更多关于 How to paste a string on each element of a vector of strings using apply in R?

问题 I have a vector of strings. d <- c("Mon","Tues","Wednes","Thurs","Fri","Satur","Sun") for which I want to paste the string "day" on each element of the vector in a way similar to this. week <- apply(d, "day", paste, sep='') 回答1: No need for apply() , just use paste() : R> d <- c("Mon","Tues","Wednes","Thurs","Fri","Satur","Sun") R> week <- paste(d, "day", sep="") R> week [1] "Monday" "Tuesday" "Wednesday" "Thursday" [4] "Friday" "Saturday" "Sunday" R> 回答2: Other have already indicated that

Faster way to read fixed-width files

阅读更多关于 Faster way to read fixed-width files

I work with a lot of fixed width files (i.e., no separating character) that I need to read into R. So, there is usually a definition of the column width to parse the string into variables. I can use read.fwf to read in the data without a problem. However, for large files, this can take a long time. For a recent dataset, this took 800 seconds to read in a dataset with ~500,000 rows and 143 variables. seer9 <- read.fwf("~/data/rawdata.txt", widths = cols, header = FALSE, buffersize = 250000, colClasses = "character", stringsAsFactors = FALSE)) fread in the data.table package in R is awesome for

use multiple columns as variables with sapply

阅读更多关于 use multiple columns as variables with sapply

问题 I have a dataframe and I would like to apply a function that takes the values of three columns and computes the minimum difference between the three values. #dataset df <- data.frame(a= sample(1:100, 10),b = sample(1:100, 10),c= sample(1:100, 10)) #function minimum_distance <- function(a,b,c) { dist1 <- abs(a-b) dist2 <- abs(a-c) dist3 <- abs(b-c) return(min(dist1,dist2,dist3)) } I am looking for something like: df$distance <- sapply(df, function(x) minimum_distance(x$a,x$b,x$c) ) ##