data.table

Should I use mget(), .. or with=FALSE to select columns of a data.table?

梦想的初衷 提交于 2020-05-10 09:28:49
问题 There are multiple ways to select columns of data.table by using a variable holding the desired column names ( with=FALSE , .. , mget , ...). Is there a consensus which to use (when)? Is one more data.table -y than the others? I could come up with the following arguments: with=FALSE and .. are almost equally fast, while mget is slower .. can't select concatenated column names "on the fly" ( EDIT : current CRAN version 1.12.8 definitely can, I was using an old version, which could not, so this

Should I use mget(), .. or with=FALSE to select columns of a data.table?

☆樱花仙子☆ 提交于 2020-05-10 09:28:06
问题 There are multiple ways to select columns of data.table by using a variable holding the desired column names ( with=FALSE , .. , mget , ...). Is there a consensus which to use (when)? Is one more data.table -y than the others? I could come up with the following arguments: with=FALSE and .. are almost equally fast, while mget is slower .. can't select concatenated column names "on the fly" ( EDIT : current CRAN version 1.12.8 definitely can, I was using an old version, which could not, so this

remove duplicates and collapse near duplicates based on time difference

你离开我真会死。 提交于 2020-05-08 05:33:38
问题 I have a data-frame like as shown below DF = structure(list(Age_visit = c(48, 48, 48, 49, 49, 77), Date_1 = c("8/6/2169 9:40", "8/6/2169 9:40", "8/6/2169 9:41", "8/6/2169 9:42", "24/7/2169 8:31", "12/9/2169 10:30", "19/6/2237 12:15"), Date_2 = c("NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA"), person_id = c("21", "21", "21", "21", "21", "21", "31" ), enc_id = c("A21BC","A21BC", "A22BC", "A23BC",

remove duplicates and collapse near duplicates based on time difference

ε祈祈猫儿з 提交于 2020-05-08 05:30:12
问题 I have a data-frame like as shown below DF = structure(list(Age_visit = c(48, 48, 48, 49, 49, 77), Date_1 = c("8/6/2169 9:40", "8/6/2169 9:40", "8/6/2169 9:41", "8/6/2169 9:42", "24/7/2169 8:31", "12/9/2169 10:30", "19/6/2237 12:15"), Date_2 = c("NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA", "NA-NA-NA NA:NA:NA"), person_id = c("21", "21", "21", "21", "21", "21", "31" ), enc_id = c("A21BC","A21BC", "A22BC", "A23BC",

Create new data.table columns based on other columns

空扰寡人 提交于 2020-04-30 10:07:44
问题 I have a data.table containing some state name abbreviations and county names. I want to get approx. coordinates from ggplot2::map_data('county') for each row. I can do this sequentially with multiple lines of code using := but I would like to make only one function call. Below is what I've tried: Data: library(data.table) library(ggplot2) > dput(dt[1:20, .(state, county, prime_mover)]) structure(list(state = c("AZ", "AZ", "CA", "CA", "CA", "CT", "FL", "IN", "MA", "MA", "MA", "MN", "NJ", "NJ"

Create new data.table columns based on other columns

拈花ヽ惹草 提交于 2020-04-30 10:06:29
问题 I have a data.table containing some state name abbreviations and county names. I want to get approx. coordinates from ggplot2::map_data('county') for each row. I can do this sequentially with multiple lines of code using := but I would like to make only one function call. Below is what I've tried: Data: library(data.table) library(ggplot2) > dput(dt[1:20, .(state, county, prime_mover)]) structure(list(state = c("AZ", "AZ", "CA", "CA", "CA", "CT", "FL", "IN", "MA", "MA", "MA", "MN", "NJ", "NJ"

Create new data.table columns based on other columns

感情迁移 提交于 2020-04-30 10:05:18
问题 I have a data.table containing some state name abbreviations and county names. I want to get approx. coordinates from ggplot2::map_data('county') for each row. I can do this sequentially with multiple lines of code using := but I would like to make only one function call. Below is what I've tried: Data: library(data.table) library(ggplot2) > dput(dt[1:20, .(state, county, prime_mover)]) structure(list(state = c("AZ", "AZ", "CA", "CA", "CA", "CT", "FL", "IN", "MA", "MA", "MA", "MN", "NJ", "NJ"

nested loops through a structured list in R

余生颓废 提交于 2020-04-17 20:37:47
问题 I have an example dataset, garden , as shown below. The real thing is thousands of rows. I also have an example list. productFruit . I want to know the calories of every fruit , considering the usage reported in garden . I basically want to loop through all the rows in my table, check if the usage is recorded in the productFruit list and the return either the calories or one of the following error messages: "usage out of scope" if no usage has been found in the productFruit list "fruit out of

Assigning/Referencing a column name in data.table dynamically (in i, j and by)

霸气de小男生 提交于 2020-04-10 05:24:49
问题 A) Instead of this (where cars <- data.table(cars) ) cars[ , .(`Totals:`=.N), by=speed] I need this strColumnName <- "Totals:" cars [ , strColumnName = .N, by=speed] How to do it? B) Similarly (more general case) - instead of this: cars[ dist > 50, .(`Totals:`=.N, x=dist*100), by=speed] I need this: strFactor <- "dist" cars[ strFactor > 50, .(`Totals:`=.N, x=strFactor*100), by=speed] This question is about GENERAL way of assigning/referencing column name variables in data.table, i.e. in 'j'

How to replace column with strings with look-up codes in R

风格不统一 提交于 2020-04-07 05:25:49
问题 Imagine that I have a dataframe or datatable with strings column where one row looks like this: a1; b: b1, b2, b3; c: c1, c2, c3; d: d1, d2, d3, d4 and a look-up table with codes for mapping each of these strings. For example: string code a1 10 b1 20 b2 30 b3 40 c1 50 c2 60 ... I would like to have a mapping function that maps this string to code: 10; b: 20, 30, 40; c: 50, 60, 70; d: 80, 90, 100 I have a column of these strings in data.table/data.frame (more tha 100k) so any quick solution