sapply

How do I count the number of words in a text (string)?

这一生的挚爱 提交于 2019-11-29 09:55:20
I have this string vector (for example): str <- c("this is a string current trey", "feather rtttt", "tusla", "laq") To count the number of words in this vector I used this (as given here Count the number of words in a string in R? , which is a possible duplicate but with another issue) No_words <- sapply(gregexpr("\\W+", str), length) + 1 but it returns 6 2 2 2 String has only 1 element in last two places (i.e. "tusla" and "laq" ) so it should return 6 2 1 1 How do I get around this problem? You can try sapply(gregexpr("\\S+", x), length) ## [1] 6 2 1 1 Or as suggested in comments you can try

Viewing all column names with any NA in R

Deadly 提交于 2019-11-29 07:00:49
I need to get the name of the columns that have at least 1 NA. df<-data.frame(a=1:3,b=c(NA,8,6), c=c('t',NA,7)) I need to get "b, c". I found this code: sapply(df, function(x) any(is.na(x))) But I need only the variables that have any NA. I tried this: sapply(df, function(x) colnames(df[,any(is.na(x))])) But I get all the column names. Another acrobatic solution (just for fun) : colnames(df)[!complete.cases(t(df))] [1] "b" "c" The idea is : Getting the columns of A that have at least 1 NA is equivalent to get the rows that have at least NA for t(A). complete.cases by definition (very efficient

Geographical distance by group - Applying a function on each pair of rows

扶醉桌前 提交于 2019-11-28 14:05:47
I want to calculate the average geographical distance between a number of houses per province. Suppose I have the following data. df1 <- data.frame(province = c(1, 1, 1, 2, 2, 2), house = c(1, 2, 3, 4, 5, 6), lat = c(-76.6, -76.5, -76.4, -75.4, -80.9, -85.7), lon = c(39.2, 39.1, 39.3, 60.8, 53.3, 40.2)) Using the geosphere library I can find the distance between two houses. For instance: library(geosphere) distm(c(df1$lon[1], df1$lat[1]), c(df1$lon[2], df1$lat[2]), fun = distHaversine) #11429.1 How do I calculate the distance between all the houses in the province and gather the mean distance

weighted means by group and column

*爱你&永不变心* 提交于 2019-11-28 10:31:04
I wish to obtain weighted means by group for each of several (actually about 60) columns. This question is very similar to: repeatedly applying ave for computing group means in a data frame just asked. I have come up with two ways to obtain the weighted means so far: use a separate sapply statement for each column place an sapply statement inside a for-loop However, I feel there must be a way to insert an apply statement inside the sapply statement or vice versa, thereby eliminating the for-loop . I have tried numerous permutations without success. I also looked at the sweep function. Here is

Loop linear regression and saving coefficients

放肆的年华 提交于 2019-11-28 00:31:01
This is part of the dataset (named "ME1") I'm using (all variables are numeric): Year AgeR rateM 1 1751 -1.0 0.241104596 2 1751 -0.9 0.036093609 3 1751 -0.8 0.011623734 4 1751 -0.7 0.006670552 5 1751 -0.6 0.006610552 6 1751 -0.5 0.008510828 7 1751 -0.4 0.009344041 8 1751 -0.3 0.011729740 9 1751 -0.2 0.010988005 10 1751 -0.1 0.015896107 11 1751 0.0 0.018190140 12 1751 0.1 0.024588340 13 1751 0.2 0.029801362 14 1751 0.3 0.044515912 15 1751 0.4 0.055240354 16 1751 0.5 0.088476758 17 1751 0.6 0.119045309 18 1751 0.7 0.167866571 19 1751 0.8 0.239244825 20 1751 0.9 0.329683010 21 1751 1.0 0

R: loop over columns in data.table

自古美人都是妖i 提交于 2019-11-27 18:01:37
问题 I want to determine the column classes of a large data.table. colClasses <- sapply(DT, FUN=function(x)class(x)[1]) works, but apparently local copies are stored into memory: > memory.size() [1] 687.59 > colClasses <- sapply(DT, class) > memory.size() [1] 1346.21 A loop seems not possible, because a data.table "with=FALSE" always results in a data.table. A quick and very dirty method is: DT1 <- DT[1, ] colClasses <- sapply(DT1, FUN=function(x)class(x)[1]) What is the most elegent and efficient

Remove strings found in vector 1, from vector 2

匆匆过客 提交于 2019-11-27 16:26:47
I have these two vectors: sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado") sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado") I am trying to remove sample1 strings that are found in sample2. The closest I got is by iterating using sapply , which gave me this: sapply(sample1, function(i)gsub(i, "", sample2)) .aaa .aarp .abb .abbott .abogado [1,] "try1.aarp" "try1" "try1.aarp" "try1.aarp" "try1.aarp" [2,] "www.tryagain" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa" [3,] "255.255.255.255" "255.255.255.255" "255.255

Loop linear regression and saving coefficients

扶醉桌前 提交于 2019-11-26 23:25:54
问题 This is part of the dataset (named "ME1") I'm using (all variables are numeric): Year AgeR rateM 1 1751 -1.0 0.241104596 2 1751 -0.9 0.036093609 3 1751 -0.8 0.011623734 4 1751 -0.7 0.006670552 5 1751 -0.6 0.006610552 6 1751 -0.5 0.008510828 7 1751 -0.4 0.009344041 8 1751 -0.3 0.011729740 9 1751 -0.2 0.010988005 10 1751 -0.1 0.015896107 11 1751 0.0 0.018190140 12 1751 0.1 0.024588340 13 1751 0.2 0.029801362 14 1751 0.3 0.044515912 15 1751 0.4 0.055240354 16 1751 0.5 0.088476758 17 1751 0.6 0

Means multiple columns by multiple groups

不羁的心 提交于 2019-11-26 21:54:15
问题 I am trying to find the means, not including NAs, for multiple columns withing a dataframe by multiple groups airquality <- data.frame(City = c("CityA", "CityA","CityA", "CityB","CityB","CityB", "CityC", "CityC"), year = c("1990", "2000", "2010", "1990", "2000", "2010", "2000", "2010"), month = c("June", "July", "August", "June", "July", "August", "June", "August"), PM10 = c(runif(3), rnorm(5)), PM25 = c(runif(3), rnorm(5)), Ozone = c(runif(3), rnorm(5)), CO2 = c(runif(3), rnorm(5)))

Apply a function to every row of a matrix or a data frame

亡梦爱人 提交于 2019-11-26 19:31:48
Suppose I have a n by 2 matrix and a function that takes a 2-vector as one of its arguments. I would like to apply the function to each row of the matrix and get a n-vector. How to do this in R? For example, I would like to compute the density of a 2D standard Normal distribution on three points: bivariate.density(x = c(0, 0), mu = c(0, 0), sigma = c(1, 1), rho = 0){ exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+x[2]^2/sigma[2]^2-2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2)) } out <- rbind(c(1, 2), c(3, 4), c(5, 6)) How to apply the function to each row of out ?