data.table

How to replace NA values in a data.table with na.spline

末鹿安然 提交于 2021-02-16 20:09:06
问题 I'm trying to prepare some demographic data retrieved from Eurostat for further processing, amongst others replacing any missing data with corresponding approximated ones. First I was using data.frames only, but then I got convinced that data.tables might offer some advantages over regular data.frames, so I migrated to data.tables. One thing I've observed while doing so was getting different results when using "na.spline" in combination with "apply" versus "na.spline" as part of the data

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

旧街凉风 提交于 2021-02-16 20:07:29
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

折月煮酒 提交于 2021-02-16 20:07:05
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

纵然是瞬间 提交于 2021-02-16 20:06:52
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

拜拜、爱过 提交于 2021-02-16 20:06:35
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Ranking multiple columns by different orders using data table

99封情书 提交于 2021-02-16 14:32:08
问题 Using my example below, how can I rank multiple columns using different orders, so for example rank y as descending and z as ascending? require(data.table) dt <- data.table(x = c(rep("a", 5), rep("b", 5)), y = abs(rnorm(10)) * 10, z = abs(rnorm(10)) * 10) cols <- c("y", "z") dt[, paste0("rank_", cols) := lapply(.SD, function(x) frankv(x, ties.method = "min")), .SDcols = cols, by = .(x)] 回答1: data.table 's frank() function has some useful features which aren't available in base R's rank()

Count consecutive days by group

人走茶凉 提交于 2021-02-11 17:51:09
问题 I am looking to add a field that counts the number of consecutive days within each group (captured by id field). I start with this: dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000", "5/01/2000", "6/01/2000", "7/01/2000", "8/01/2000", "13/01/2000", "14/01/2000", "18/01/2000", "19/01/2000", "21/01/2000", "25/01/2000", "26/01/2000", "30/01/2000", "31/01/2000")), .Names = c("id", "date"), row.names = c(NA,

Automisation of creating new variables based on the distribution of other variables in the data

佐手、 提交于 2021-02-11 14:34:05
问题 I have data as follows: EDIT: Sample of Original Data DT <- structure(list(Abbreviation = "AK", date = "1/31/2011", month = "01", year = "2011", c1 = "P", male = 12288, female = 6107, c4 = 2, upto22 = 870, from22to24 = 1441, from25to34 = 5320, from35to44 = 3568, from45to54 = 4322, from55to59 = 1539, from60to64 = 886, over65 = 451, c20 = 0, hispanic = 771, non_hispanic = 17458, c42 = 168, native = 4856, asian = 791, black = 611, hawaii = 289, white = 11209, c48 = 641), row.names = c(NA, -1L),

Reshaping a table in R while parsing information from column names and using it to collect information from specific columns

冷暖自知 提交于 2021-02-11 13:00:22
问题 I have this badly organized data table given to me, in which there are hundreds of columns (subset is given below) Names of columns are dot delimited where the first field holds information about a type of object (e.g. Item123, object_AB etc.) without any naming convention. There is no specific order for these columns as well. Other columns share the type of object field and also have the name of some property for that object (e.g. color, manufacturer etc.). Item123.type.value Item123.mass

Passing multiple arguments to data.table inside a function

回眸只為那壹抹淺笑 提交于 2021-02-11 10:30:07
问题 Here is the output that I want from data.table . library(data.table) dt_mtcars <- as.data.table(mtcars) ## desired output ---- dt_mtcars[mpg >20 , .(mean_mpg = mean(mpg) ,median_mpg = median(mpg)) , .(cyl, gear)] cyl gear mean_mpg median_mpg 1: 6 4 21.000 21.00 2: 4 4 26.925 25.85 3: 6 3 21.400 21.40 4: 4 3 21.500 21.50 5: 4 5 28.200 28.20 I want to get the output by passing arguments to a function. processFUN <- function(dt, where, select, group){ out <- dt[i=eval(parse(text = where)) ,j