data.table

Fastest way to extract hour from time (HH:MM)

本秂侑毒 提交于 2020-01-02 00:53:09
问题 Wish fastPOSIXct works - but not working in this case. Here is my time data (which does not have dates) - and I need to get the hours-part from them. times <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56") Here is the wrong output from fastPOSIXct : fastPOSIXct(times, "GMT") [1] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT" [3] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT" [5] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT" It does not recognize the times without the

fuzzyjoin two data frames using data.table

旧时模样 提交于 2020-01-01 18:18:15
问题 I have been working on a fuzzyjoin to join 2 data frames together however due to memory issues the join causes cannot allocate memory of… . So I am trying to join the data using data.table . A sample of the data is below. df1 looks like: ID f_date ACCNUM flmNUM start_date end_date 1 50341 2002-03-08 0001104659-02-000656 2571187 2002-09-07 2003-08-30 2 1067983 2009-11-25 0001047469-09-010426 91207220 2010-05-27 2011-05-19 3 804753 2004-05-14 0001193125-04-088404 4805453 2004-11-13 2005-11-05 4

“recursive” self join in data.table

荒凉一梦 提交于 2020-01-01 14:10:10
问题 I have a component list made of 3 columns: product, component and quantity of component used: a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3", "prod3", "int1", "int1", "int2", "int2"), component = c("a", "int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table", "data.frame")) prodName component qty 1 prod1 a 1 2 prod1 int1 2 3 prod2 b 3 4 prod3 b 4 5 prod3 int2 5 6 int1 a 6 7 int1 b 7 8

“recursive” self join in data.table

扶醉桌前 提交于 2020-01-01 14:09:27
问题 I have a component list made of 3 columns: product, component and quantity of component used: a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3", "prod3", "int1", "int1", "int2", "int2"), component = c("a", "int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table", "data.frame")) prodName component qty 1 prod1 a 1 2 prod1 int1 2 3 prod2 b 3 4 prod3 b 4 5 prod3 int2 5 6 int1 a 6 7 int1 b 7 8

Assign unique ID per multiple columns of data table

痴心易碎 提交于 2020-01-01 12:16:29
问题 I would like to assign unique IDs to rows of a data table per multiple column values. Let's consider a simple example: library(data.table) DT = data.table(a=c(4,2,NA,2,NA), b=c("a","b","c","b","c"), c=1:5) a b c 1: 4 a 1 2: 2 b 2 3: NA c 3 4: 2 b 4 5: NA c 5 I'd like to generate IDs based on columns a and b and expect to get three IDs where 2nd and 4th row IDs are identical and 3rd and 5th rows have the same ID as well. I have seen two solutions but each are slightly incomplete: 1) Solution

Grouping in data.table: how to get more than 1 column of results?

扶醉桌前 提交于 2020-01-01 10:02:22
问题 I have a data.table object like this one library(data.table) a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L), SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L), PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4), RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)), .Names = c("PERMNO", "SHROUT", "PRC", "RET"), class = c("data.table", "data.frame"), row.names = c(NA, -6L)) setkey(a,PERMNO) and I need to perform a number of calculations by PERMNO , but here in this example

How to find the last or next entry using R package data.table and rolling joins

风流意气都作罢 提交于 2020-01-01 09:20:09
问题 Lets say I have a data table like this. customer_id time_stamp value 1: 1 223 4 2: 1 252 1 3: 1 456 3 4: 2 455 5 5: 2 632 2 So that customer_id and time_stamp together form a unique key. I want to add some new columns indicating the previous and last values of "value". That is, I want output like this. customer_id time_stamp value value_PREV value_NEXT 1: 1 223 4 NA 1 2: 1 252 1 4 3 3: 1 456 3 1 NA 4: 2 455 5 NA 2 5: 2 632 2 5 NA I want this to be fast and work with sparse, irregular times. I

different results for standard form and functional form of data.table assigne-by-reference `:=`

南笙酒味 提交于 2020-01-01 09:17:43
问题 There seems to be a minor difference between data.tabel's assignment by reference := in the standard to the functinal form. Standard form coerces RHS to vector, the functional form does not. A detail, but not documented as I believe. library(data.table) dt <- data.table(a = c('a','b','c')) v <- c('A','B','C') l <- list(v) all.equal(copy(dt)[, new := v], copy(dt)[, `:=` (new = v)]) # [1] TRUE all.equal(copy(dt)[, new := l], copy(dt)[, `:=` (new = l)]) # [1] "Datasets have different column

different results for standard form and functional form of data.table assigne-by-reference `:=`

末鹿安然 提交于 2020-01-01 09:17:22
问题 There seems to be a minor difference between data.tabel's assignment by reference := in the standard to the functinal form. Standard form coerces RHS to vector, the functional form does not. A detail, but not documented as I believe. library(data.table) dt <- data.table(a = c('a','b','c')) v <- c('A','B','C') l <- list(v) all.equal(copy(dt)[, new := v], copy(dt)[, `:=` (new = v)]) # [1] TRUE all.equal(copy(dt)[, new := l], copy(dt)[, `:=` (new = l)]) # [1] "Datasets have different column

NA in data.table

做~自己de王妃 提交于 2020-01-01 08:44:31
问题 I have a data.table that contains some groups. I operate on each group and some groups return numbers, others return NA . For some reason data.table has trouble putting everything back together. Is this a bug or am I misunderstanding? Here is an example: dtb <- data.table(a=1:10) f <- function(x) {if (x==9) {return(NA)} else { return(x)}} dtb[,f(a),by=a] Error in `[.data.table`(dtb, , f(a), by = a) : columns of j don't evaluate to consistent types for each group: result for group 9 has column