sapply | 易学教程

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

阅读更多关于 Computing pairwise Hamming distance between all rows of two integer matrices/data frames

I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471, 157122, 866381, 582868, 878) y <- c(356739, 324042, 904133, 959893, 433677, 110269, 576942, 2230, 267130,

R: loop over columns in data.table

阅读更多关于 R: loop over columns in data.table

I want to determine the column classes of a large data.table. colClasses <- sapply(DT, FUN=function(x)class(x)[1]) works, but apparently local copies are stored into memory: > memory.size() [1] 687.59 > colClasses <- sapply(DT, class) > memory.size() [1] 1346.21 A loop seems not possible, because a data.table "with=FALSE" always results in a data.table. A quick and very dirty method is: DT1 <- DT[1, ] colClasses <- sapply(DT1, FUN=function(x)class(x)[1]) What is the most elegent and efficient way to do this? Have briefly investigated, and it looks like a data.table bug. > DT = data.table(a=1

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

阅读更多关于 Computing pairwise Hamming distance between all rows of two integer matrices/data frames

问题 I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471,

Extracting nth element from a nested list following strsplit - R

阅读更多关于 Extracting nth element from a nested list following strsplit - R

I've been trying to understand how to deal with the output of strsplit a bit better. I often have data such as this that I wish to split: mydata <- c("144/4/5", "154/2", "146/3/5", "142", "143/4", "DNB", "90") #[1] "144/4/5" "154/2" "146/3/5" "142" "143/4" "DNB" "90" After splitting that the results are as follows: strsplit(mydata, "/") #[[1]] #[1] "144" "4" "5" #[[2]] #[1] "154" "2" #[[3]] #[1] "146" "3" "5" #[[4]] #[1] "142" #[[5]] #[1] "143" "4" #[[6]] #[1] "DNB" #[[7]] #[1] "90" I know from the strsplit help guide that final empty strings are not produced. Therefore, there will be 1, 2 or

Speeding up function that uses which within a sapply call in R

阅读更多关于 Speeding up function that uses which within a sapply call in R

I have two vector e and g . I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is: set.seed(21) e <- rnorm(1e4) g <- rnorm(1e4) mf <- function(p,v) {100*length(which(v<=p))/length(v)} mf.out <- sapply(X=e, FUN=mf, v=g) With large e or g , this takes a lot of time to run. How can I change or adapt this code to make this run faster? Note: The mf function above is based on code from the mess function in the dismo package. The reason this is so slow is because you're calling your function length(e) times. It doesn't make a large

Viewing all column names with any NA in R

阅读更多关于 Viewing all column names with any NA in R

问题 I need to get the name of the columns that have at least 1 NA. df<-data.frame(a=1:3,b=c(NA,8,6), c=c('t',NA,7)) I need to get "b, c". I found this code: sapply(df, function(x) any(is.na(x))) But I need only the variables that have any NA. I tried this: sapply(df, function(x) colnames(df[,any(is.na(x))])) But I get all the column names. 回答1: Another acrobatic solution (just for fun) : colnames(df)[!complete.cases(t(df))] [1] "b" "c" The idea is : Getting the columns of A that have at least 1

How to subset from a list in R

阅读更多关于 How to subset from a list in R

I have a rather simple task but haven't find a good solution. > mylist [[1]] [1] 1 2 3 4 5 6 7 8 9 10 [[2]] [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" [[3]] [1] 25 26 27 28 29 30 31 32 y <- c(3,5,9) I would like to extract from mylist the sub-elements 3,5, and 9 of each component in the list. I have tried, sapply[mylist,"[[",y] but not luck!, and others like vapply, lapply, etc.. Thanks in advance for your help Mauricio Ortiz You could use sapply(mylist, "[", y) : mylist <- list(1:5, 6:10, 11:15) sapply(mylist, "[", c(2,3)) Try

Extracting nth element from a nested list following strsplit - R

阅读更多关于 Extracting nth element from a nested list following strsplit - R

问题 I've been trying to understand how to deal with the output of strsplit a bit better. I often have data such as this that I wish to split: mydata <- c("144/4/5", "154/2", "146/3/5", "142", "143/4", "DNB", "90") #[1] "144/4/5" "154/2" "146/3/5" "142" "143/4" "DNB" "90" After splitting that the results are as follows: strsplit(mydata, "/") #[[1]] #[1] "144" "4" "5" #[[2]] #[1] "154" "2" #[[3]] #[1] "146" "3" "5" #[[4]] #[1] "142" #[[5]] #[1] "143" "4" #[[6]] #[1] "DNB" #[[7]] #[1] "90" I know

Speeding up function that uses which within a sapply call in R

阅读更多关于 Speeding up function that uses which within a sapply call in R

问题 I have two vector e and g . I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is: set.seed(21) e <- rnorm(1e4) g <- rnorm(1e4) mf <- function(p,v) {100*length(which(v<=p))/length(v)} mf.out <- sapply(X=e, FUN=mf, v=g) With large e or g , this takes a lot of time to run. How can I change or adapt this code to make this run faster? Note: The mf function above is based on code from the mess function in the dismo package. 回答1:

Using “…” and “replicate”

阅读更多关于 Using “…” and “replicate”

In the documentation of sapply and replicate there is a warning regarding using ... Now, I can accept it as such, but would like to understand what is behind it. So I've created this little contrived example: innerfunction<-function(x, extrapar1=0, extrapar2=extrapar1) { cat("x:", x, ", xp1:", extrapar1, ", xp2:", extrapar2, "\n") } middlefunction<-function(x,...) { innerfunction(x,...) } outerfunction<-function(x, ...) { cat("Run middle function:\n") replicate(2, middlefunction(x,...)) cat("Run inner function:\n") replicate(2, innerfunction(x,...)) } outerfunction(1,2,3) outerfunction(1