sapply

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

两盒软妹~` 提交于 2019-12-01 00:42:35
I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471, 157122, 866381, 582868, 878) y <- c(356739, 324042, 904133, 959893, 433677, 110269, 576942, 2230, 267130,

R: loop over columns in data.table

我与影子孤独终老i 提交于 2019-11-30 19:35:12
I want to determine the column classes of a large data.table. colClasses <- sapply(DT, FUN=function(x)class(x)[1]) works, but apparently local copies are stored into memory: > memory.size() [1] 687.59 > colClasses <- sapply(DT, class) > memory.size() [1] 1346.21 A loop seems not possible, because a data.table "with=FALSE" always results in a data.table. A quick and very dirty method is: DT1 <- DT[1, ] colClasses <- sapply(DT1, FUN=function(x)class(x)[1]) What is the most elegent and efficient way to do this? Have briefly investigated, and it looks like a data.table bug. > DT = data.table(a=1

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

拈花ヽ惹草 提交于 2019-11-30 19:31:40
问题 I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471,

Extracting nth element from a nested list following strsplit - R

故事扮演 提交于 2019-11-30 14:01:27
I've been trying to understand how to deal with the output of strsplit a bit better. I often have data such as this that I wish to split: mydata <- c("144/4/5", "154/2", "146/3/5", "142", "143/4", "DNB", "90") #[1] "144/4/5" "154/2" "146/3/5" "142" "143/4" "DNB" "90" After splitting that the results are as follows: strsplit(mydata, "/") #[[1]] #[1] "144" "4" "5" #[[2]] #[1] "154" "2" #[[3]] #[1] "146" "3" "5" #[[4]] #[1] "142" #[[5]] #[1] "143" "4" #[[6]] #[1] "DNB" #[[7]] #[1] "90" I know from the strsplit help guide that final empty strings are not produced. Therefore, there will be 1, 2 or

Speeding up function that uses which within a sapply call in R

拟墨画扇 提交于 2019-11-30 09:34:24
I have two vector e and g . I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is: set.seed(21) e <- rnorm(1e4) g <- rnorm(1e4) mf <- function(p,v) {100*length(which(v<=p))/length(v)} mf.out <- sapply(X=e, FUN=mf, v=g) With large e or g , this takes a lot of time to run. How can I change or adapt this code to make this run faster? Note: The mf function above is based on code from the mess function in the dismo package. The reason this is so slow is because you're calling your function length(e) times. It doesn't make a large

Viewing all column names with any NA in R

你离开我真会死。 提交于 2019-11-30 08:31:10
问题 I need to get the name of the columns that have at least 1 NA. df<-data.frame(a=1:3,b=c(NA,8,6), c=c('t',NA,7)) I need to get "b, c". I found this code: sapply(df, function(x) any(is.na(x))) But I need only the variables that have any NA. I tried this: sapply(df, function(x) colnames(df[,any(is.na(x))])) But I get all the column names. 回答1: Another acrobatic solution (just for fun) : colnames(df)[!complete.cases(t(df))] [1] "b" "c" The idea is : Getting the columns of A that have at least 1

How to subset from a list in R

让人想犯罪 __ 提交于 2019-11-29 23:56:24
I have a rather simple task but haven't find a good solution. > mylist [[1]] [1] 1 2 3 4 5 6 7 8 9 10 [[2]] [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" [[3]] [1] 25 26 27 28 29 30 31 32 y <- c(3,5,9) I would like to extract from mylist the sub-elements 3,5, and 9 of each component in the list. I have tried, sapply[mylist,"[[",y] but not luck!, and others like vapply, lapply, etc.. Thanks in advance for your help Mauricio Ortiz You could use sapply(mylist, "[", y) : mylist <- list(1:5, 6:10, 11:15) sapply(mylist, "[", c(2,3)) Try

Extracting nth element from a nested list following strsplit - R

﹥>﹥吖頭↗ 提交于 2019-11-29 19:51:35
问题 I've been trying to understand how to deal with the output of strsplit a bit better. I often have data such as this that I wish to split: mydata <- c("144/4/5", "154/2", "146/3/5", "142", "143/4", "DNB", "90") #[1] "144/4/5" "154/2" "146/3/5" "142" "143/4" "DNB" "90" After splitting that the results are as follows: strsplit(mydata, "/") #[[1]] #[1] "144" "4" "5" #[[2]] #[1] "154" "2" #[[3]] #[1] "146" "3" "5" #[[4]] #[1] "142" #[[5]] #[1] "143" "4" #[[6]] #[1] "DNB" #[[7]] #[1] "90" I know

Speeding up function that uses which within a sapply call in R

醉酒当歌 提交于 2019-11-29 14:35:04
问题 I have two vector e and g . I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is: set.seed(21) e <- rnorm(1e4) g <- rnorm(1e4) mf <- function(p,v) {100*length(which(v<=p))/length(v)} mf.out <- sapply(X=e, FUN=mf, v=g) With large e or g , this takes a lot of time to run. How can I change or adapt this code to make this run faster? Note: The mf function above is based on code from the mess function in the dismo package. 回答1:

Using “…” and “replicate”

杀马特。学长 韩版系。学妹 提交于 2019-11-29 13:23:47
In the documentation of sapply and replicate there is a warning regarding using ... Now, I can accept it as such, but would like to understand what is behind it. So I've created this little contrived example: innerfunction<-function(x, extrapar1=0, extrapar2=extrapar1) { cat("x:", x, ", xp1:", extrapar1, ", xp2:", extrapar2, "\n") } middlefunction<-function(x,...) { innerfunction(x,...) } outerfunction<-function(x, ...) { cat("Run middle function:\n") replicate(2, middlefunction(x,...)) cat("Run inner function:\n") replicate(2, innerfunction(x,...)) } outerfunction(1,2,3) outerfunction(1