data.table | 易学教程

Match data to nearest time value by id

阅读更多关于 Match data to nearest time value by id

问题 I have generated a series of hourly time stamps with: intervals <- seq(as.POSIXct("2018-01-20 00:00:00", tz = 'America/Los_Angeles'), as.POSIXct("2018-01-20 03:00:00", tz = 'America/Los_Angeles'), by="hour") > intervals [1] "2018-01-20 00:00:00 PST" "2018-01-20 01:00:00 PST" "2018-01-20 02:00:00 PST" [4] "2018-01-20 03:00:00 PST" Given a dataset with messy and unevenly spaced timestamps, how would one match time values from that dataset to the closest hourly timestamp by id , and remove other

How can I reshape a list of list from wide to long

阅读更多关于 How can I reshape a list of list from wide to long

问题 I have a list of list with common structure require(data.table) l <- list(a1 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3))), a2 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3)))) Sometimes it is easier for lapply to change the structure to go from a 2x3 list to a 3x2 list like: +a1---b +b---a1 ---c ---a2 ---d +c---a1 +a2---b to ---a2 ---c +d---a1 ---d ---a2 Is there an idiomatic way to do this ? Can it be done without copying over

How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

阅读更多关于 How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

问题 I know it is a very silly question but I could not sort it out that is why asking... How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs as rownames. e.g. IDs Var2 Ae4 2 Ae4 4 Ae4 6 Bc3 3 Bc3 5 Ad2 8 Ad2 7 OutPut Var(x) Ae4 4 Bc3 4 Ad2 7.5 回答1: This kinds of things can easily be done using the plyr function ddply : dat = data.frame(ID = rep(LETTERS[1:5], each = 20), value = runif(100)) > head(dat) ID value 1 A 0

R data.table: using fread on all .csv files in folder skipping the last line of each

阅读更多关于 R data.table: using fread on all .csv files in folder skipping the last line of each

问题 I have hundreds of .csv files I need to read in using fread and save as one data table. The basic structure is the same for each .csv. There is header info that needs to be skipped (easy using skip = ). I am having difficulty with skipping the last line of each .csv file. Each .csv file has a different number of rows. If I have only one file in the Test folder, this script perfectly skips the first rows (using skip = ) and the last row (using nrows = ): file <- list.files("Q:/Test/", full

R - Group data but apply different functions to different columns

阅读更多关于 R - Group data but apply different functions to different columns

问题 I'd like to group this data but apply different functions to some columns when grouping. ID type isDesc isImage 1 1 1 0 1 1 0 1 1 1 0 1 4 2 0 1 4 2 1 0 6 1 1 0 6 1 0 1 6 1 0 0 I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this: ID type isDesc isImage 1 1 1 2 4 2 1 1 6 1 1 1 Currently I am using library(plyr) summarized = ddply(data, .(ID),

data.table: lapply a function with multicolumn output

阅读更多关于 data.table: lapply a function with multicolumn output

问题 I'm using a function smean.cl.normal from Hmisc package that returns a vector with 3 values: the mean and the lower and upper CI. When I use it on a data.table with 2 groups, I obtain 2 columns and 6 rows. Is there a way to obtain the result with two rows corresponding to 2 groups and separate columns for each of function's outputs, i.e. the mean and CIs? require(Hmisc) require(data.table) dt = data.table(x = rnorm(100), gr = rep(c('A', 'B'), each = 50)) dt[, lapply(.SD, smean.cl.normal), by

Assignment via `:=` in a for loop (R data.table)

阅读更多关于 Assignment via `:=` in a for loop (R data.table)

问题 I'm trying to assign some new variables within a for loop (I'm trying to create some variables with common structure, but which are subsample-dependent). I've tried for the life of me to re-produce this error on sample data and I can't. Here's code that works & gets the gist of what I want to do: DT <- data.table( id = rep(1:100, each = 20L), period = rep(-9:10, 100L), grp = rep(sample(4L, size = 100L, replace = TRUE), each = 20L), y = runif(2000, min=0, max=5), key = c("id", "period") ) DT[

Is it possible to use the data.table index-join-assignment idiom to do a left join and assign NAs in the non-matching rows of i to x?

阅读更多关于 Is it possible to use the data.table index-join-assignment idiom to do a left join and assign NAs in the non-matching rows of i to x?

问题 Yesterday I gave this answer: Matching Data Tables by five columns to change a value in another column. In the comments, the OP asked if we could effectively achieve a left join of the two tables and thereby get the NAs that would result in the right table to be assigned to the left table. It seems to me that data.table does not provide any means of doing this. Here's the example case I used in that question: set.seed(1L); dt1 <- data.table(id=1:12,expand.grid(V1=1:3,V2=1:4),blah1=rnorm(12L))

Why is stringr changing encoding when manipulating strings?

阅读更多关于 Why is stringr changing encoding when manipulating strings?

问题 There is this strange behavior of stringr , which is really annoying me. stringr changes without a warning the encoding of some strings that contain exotic characters, in my case ø, å, æ, é and some others... If you str_trim a vector of characters, then those with exotic letters will be converted to a new Encoding. letter1 <- readline('Gimme an ASCII character!') # try q or a letter2 <- readline('Gimme an non-ASCII character!') # try ø or é Letters <- c(letter1, letter2) Encoding(Letters) #

subsetting by multi-column index/key in dplyr (have data.table soln)

阅读更多关于 subsetting by multi-column index/key in dplyr (have data.table soln)

问题 I'm looking to find a way to subset (or rethink how I handle the task) the following situation to stay in dplyr rather than "resort" to data.table as much of my analysis before/after this chunk is done in dplyr. Situation: given a simulated dataset with multiple replications I would like to subset/dplyr::filter based on a two column key (ID and REP). libs <- c("dplyr", "data.table") lapply(libs, require, character.only = T) # minimally reproducible example # dataset dat <- expand.grid(ID = 1