data.table

Efficiently merging two data frames on a non-trivial criteria

爷,独闯天下 提交于 2020-01-09 07:48:13
问题 Answering this question last night, I spent a good hour trying to find a solution that didn't grow a data.frame in a for loop, without any success, so I'm curious if there's a better way to go about this problem. The general case of the problem boils down to this: Merge two data.frames Entries in either data.frame can have 0 or more matching entries in the other. We only care about entries that have 1 or more matches across both. The match function is complex involving multiple columns in

Efficiently merging two data frames on a non-trivial criteria

♀尐吖头ヾ 提交于 2020-01-09 07:48:04
问题 Answering this question last night, I spent a good hour trying to find a solution that didn't grow a data.frame in a for loop, without any success, so I'm curious if there's a better way to go about this problem. The general case of the problem boils down to this: Merge two data.frames Entries in either data.frame can have 0 or more matching entries in the other. We only care about entries that have 1 or more matches across both. The match function is complex involving multiple columns in

Subsetting data.table set by date range in R

谁都会走 提交于 2020-01-09 06:25:34
问题 I have a large dataset in data.table that I'd like to subset by a date range. My data set looks like this: testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04", "2013-10-05","2013-11-06")), yr = c(2013,2013,2013,2013,2013), mo = c(07,08,09,10,11), da = c(02,03,04,05,06), plant = LETTERS[1:5], product = as.factor(letters[26:22]), rating = runif(25)) I'd like to be able to choose a date range directly from the as.Date column without using the yr , mo , or da columns.

Any way to force fread() of data.table not to stop on empty lines?

廉价感情. 提交于 2020-01-09 05:20:06
问题 (question is not relevant anymore, since new version of data.table of 25-NOV-2016 - see accepted answer below) So, I have a table with some empty lines in the middle. When I try to open it with fread , it stops, saying Stopped reading at empty line 10006, but text exists afterwards (discarded) . Is there any way to avoid this without changing the data file? 回答1: Version 1.9.8 of data.table, released 25-NOV-2016, has a new blank.lines.skip option to skip blank lines. text <- "1,a\n\n2,b\n3,c

Using data.table i and j arguments in functions

假装没事ソ 提交于 2020-01-09 04:23:28
问题 I am trying to write some wrapper functions to reduce code duplication with data.table . Here is an example using mtcars . First, set up some data: library(data.table) data(mtcars) mtcars$car <- factor(gsub("(.*?) .*", "\\1", rownames(mtcars)), ordered=TRUE) mtcars <- data.table(mtcars) Now, here is what I would usually write to get a summary of counts by group. In this case I am grouping by car : mtcars[, list(Total=length(mpg)), by="car"][order(car)] car Total AMC 1 Cadillac 1 Camaro 1 ...

Using data.table i and j arguments in functions

被刻印的时光 ゝ 提交于 2020-01-09 04:23:07
问题 I am trying to write some wrapper functions to reduce code duplication with data.table . Here is an example using mtcars . First, set up some data: library(data.table) data(mtcars) mtcars$car <- factor(gsub("(.*?) .*", "\\1", rownames(mtcars)), ordered=TRUE) mtcars <- data.table(mtcars) Now, here is what I would usually write to get a summary of counts by group. In this case I am grouping by car : mtcars[, list(Total=length(mpg)), by="car"][order(car)] car Total AMC 1 Cadillac 1 Camaro 1 ...

Extract row corresponding to minimum value of a variable by group

♀尐吖头ヾ 提交于 2020-01-08 08:52:31
问题 I wish to (1) group data by one variable ( State ), (2) within each group find the row of minimum value of another variable ( Employees ), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it. Here is a sample data set: > data State Company Employees 1 AK A 82 2 AK B 104 3 AK C 37 4 AK D 24 5 RI E 19 6 RI F 118 7 RI G 88 8 RI H 42 data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("AK", "RI")

Extract row corresponding to minimum value of a variable by group

这一生的挚爱 提交于 2020-01-08 08:51:14
问题 I wish to (1) group data by one variable ( State ), (2) within each group find the row of minimum value of another variable ( Employees ), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it. Here is a sample data set: > data State Company Employees 1 AK A 82 2 AK B 104 3 AK C 37 4 AK D 24 5 RI E 19 6 RI F 118 7 RI G 88 8 RI H 42 data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("AK", "RI")

left join in data.table [duplicate]

╄→尐↘猪︶ㄣ 提交于 2020-01-07 10:04:04
问题 This question already has answers here : Left join using data.table (2 answers) Closed 8 months ago . I am trying to do left join in data.table , I want to join panelFull and panel on the basis of OutletID . From panel I want CellID column to be inserted in panelFull : > panel[1:15,] Period CellID OutletID ACV 1: 215 1268 M44600 9563317 2: 215 1268 M44800 8966339 3: 215 1268 M45100 7043924 4: 215 1268 M45200 9013918 5: 215 1268 M45300 10009468 6: 215 1268 M46900 22148703 7: 215 1268 M48400

left join in data.table [duplicate]

老子叫甜甜 提交于 2020-01-07 09:59:04
问题 This question already has answers here : Left join using data.table (2 answers) Closed 8 months ago . I am trying to do left join in data.table , I want to join panelFull and panel on the basis of OutletID . From panel I want CellID column to be inserted in panelFull : > panel[1:15,] Period CellID OutletID ACV 1: 215 1268 M44600 9563317 2: 215 1268 M44800 8966339 3: 215 1268 M45100 7043924 4: 215 1268 M45200 9013918 5: 215 1268 M45300 10009468 6: 215 1268 M46900 22148703 7: 215 1268 M48400