data.table

Match data to nearest time value by id

别说谁变了你拦得住时间么 提交于 2020-01-02 07:18:08
问题 I have generated a series of hourly time stamps with: intervals <- seq(as.POSIXct("2018-01-20 00:00:00", tz = 'America/Los_Angeles'), as.POSIXct("2018-01-20 03:00:00", tz = 'America/Los_Angeles'), by="hour") > intervals [1] "2018-01-20 00:00:00 PST" "2018-01-20 01:00:00 PST" "2018-01-20 02:00:00 PST" [4] "2018-01-20 03:00:00 PST" Given a dataset with messy and unevenly spaced timestamps, how would one match time values from that dataset to the closest hourly timestamp by id , and remove other

How can I reshape a list of list from wide to long

只谈情不闲聊 提交于 2020-01-02 07:06:15
问题 I have a list of list with common structure require(data.table) l <- list(a1 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3))), a2 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3)))) Sometimes it is easier for lapply to change the structure to go from a 2x3 list to a 3x2 list like: +a1---b +b---a1 ---c ---a2 ---d +c---a1 +a2---b to ---a2 ---c +d---a1 ---d ---a2 Is there an idiomatic way to do this ? Can it be done without copying over

How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

邮差的信 提交于 2020-01-02 06:56:09
问题 I know it is a very silly question but I could not sort it out that is why asking... How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs as rownames. e.g. IDs Var2 Ae4 2 Ae4 4 Ae4 6 Bc3 3 Bc3 5 Ad2 8 Ad2 7 OutPut Var(x) Ae4 4 Bc3 4 Ad2 7.5 回答1: This kinds of things can easily be done using the plyr function ddply : dat = data.frame(ID = rep(LETTERS[1:5], each = 20), value = runif(100)) > head(dat) ID value 1 A 0

R data.table: using fread on all .csv files in folder skipping the last line of each

余生颓废 提交于 2020-01-02 06:25:34
问题 I have hundreds of .csv files I need to read in using fread and save as one data table. The basic structure is the same for each .csv. There is header info that needs to be skipped (easy using skip = ). I am having difficulty with skipping the last line of each .csv file. Each .csv file has a different number of rows. If I have only one file in the Test folder, this script perfectly skips the first rows (using skip = ) and the last row (using nrows = ): file <- list.files("Q:/Test/", full

R - Group data but apply different functions to different columns

若如初见. 提交于 2020-01-02 05:47:06
问题 I'd like to group this data but apply different functions to some columns when grouping. ID type isDesc isImage 1 1 1 0 1 1 0 1 1 1 0 1 4 2 0 1 4 2 1 0 6 1 1 0 6 1 0 1 6 1 0 0 I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this: ID type isDesc isImage 1 1 1 2 4 2 1 1 6 1 1 1 Currently I am using library(plyr) summarized = ddply(data, .(ID),

data.table: lapply a function with multicolumn output

╄→尐↘猪︶ㄣ 提交于 2020-01-02 05:00:47
问题 I'm using a function smean.cl.normal from Hmisc package that returns a vector with 3 values: the mean and the lower and upper CI. When I use it on a data.table with 2 groups, I obtain 2 columns and 6 rows. Is there a way to obtain the result with two rows corresponding to 2 groups and separate columns for each of function's outputs, i.e. the mean and CIs? require(Hmisc) require(data.table) dt = data.table(x = rnorm(100), gr = rep(c('A', 'B'), each = 50)) dt[, lapply(.SD, smean.cl.normal), by

Assignment via `:=` in a for loop (R data.table)

断了今生、忘了曾经 提交于 2020-01-02 04:54:10
问题 I'm trying to assign some new variables within a for loop (I'm trying to create some variables with common structure, but which are subsample-dependent). I've tried for the life of me to re-produce this error on sample data and I can't. Here's code that works & gets the gist of what I want to do: DT <- data.table( id = rep(1:100, each = 20L), period = rep(-9:10, 100L), grp = rep(sample(4L, size = 100L, replace = TRUE), each = 20L), y = runif(2000, min=0, max=5), key = c("id", "period") ) DT[

Is it possible to use the data.table index-join-assignment idiom to do a left join and assign NAs in the non-matching rows of i to x?

眉间皱痕 提交于 2020-01-02 02:48:07
问题 Yesterday I gave this answer: Matching Data Tables by five columns to change a value in another column. In the comments, the OP asked if we could effectively achieve a left join of the two tables and thereby get the NAs that would result in the right table to be assigned to the left table. It seems to me that data.table does not provide any means of doing this. Here's the example case I used in that question: set.seed(1L); dt1 <- data.table(id=1:12,expand.grid(V1=1:3,V2=1:4),blah1=rnorm(12L))

Why is stringr changing encoding when manipulating strings?

你说的曾经没有我的故事 提交于 2020-01-02 02:31:07
问题 There is this strange behavior of stringr , which is really annoying me. stringr changes without a warning the encoding of some strings that contain exotic characters, in my case ø, å, æ, é and some others... If you str_trim a vector of characters, then those with exotic letters will be converted to a new Encoding. letter1 <- readline('Gimme an ASCII character!') # try q or a letter2 <- readline('Gimme an non-ASCII character!') # try ø or é Letters <- c(letter1, letter2) Encoding(Letters) #

subsetting by multi-column index/key in dplyr (have data.table soln)

狂风中的少年 提交于 2020-01-02 02:14:05
问题 I'm looking to find a way to subset (or rethink how I handle the task) the following situation to stay in dplyr rather than "resort" to data.table as much of my analysis before/after this chunk is done in dplyr. Situation: given a simulated dataset with multiple replications I would like to subset/dplyr::filter based on a two column key (ID and REP). libs <- c("dplyr", "data.table") lapply(libs, require, character.only = T) # minimally reproducible example # dataset dat <- expand.grid(ID = 1