data.table

Create count per item by year/decade

╄→尐↘猪︶ㄣ 提交于 2020-01-11 11:37:08
问题 I have data in a data.table that is as follows: > x<-df[sample(nrow(df), 10),] > x > Importer Exporter Date 1: Ecuador United Kingdom 2004-01-13 2: Mexico United States 2013-11-19 3: Australia United States 2006-08-11 4: United States United States 2009-05-04 5: India United States 2007-07-16 6: Guatemala Guatemala 2014-07-02 7: Israel Israel 2000-02-22 8: India United States 2014-02-11 9: Peru Peru 2007-03-26 10: Poland France 2014-09-15 I am trying to create summaries so that given a time

R data.table multi column coversion by names [duplicate]

混江龙づ霸主 提交于 2020-01-11 11:34:32
问题 This question already has answers here : How to apply same function to every specified column in a data.table (6 answers) Closed 4 years ago . Let DT be a data.table: DT<-data.table(V1=factor(1:10), V2=factor(1:10), ... V9=factor(1:10),) Is there a better/simpler method to do multicolumn factor conversion like this: DT[,`:=`( Vn1=as.numeric(V1), Vn2=as.numeric(V2), Vn3=as.numeric(V3), Vn4=as.numeric(V4), Vn5=as.numeric(V5), Vn6=as.numeric(V6), Vn7=as.numeric(V7), Vn8=as.numeric(V8), Vn9=as

How do I select a subset of rows after group by a specific column in R Data table [duplicate]

别等时光非礼了梦想. 提交于 2020-01-11 10:28:48
问题 This question already has answers here : Subset data frame based on number of rows per group (3 answers) Closed 2 years ago . I want to select a subset of rows based upon a condition after grouping by a specific column in R data table . Take Mtcars data for an example . dt_mtcars <- as.data.table(mtcars) dt_mtcars[,.N,by=.(hp)] hp N 1: 110 3 2: 93 1 3: 175 3 4: 105 1 5: 245 2 6: 62 1 7: 95 1 8: 123 2 9: 180 3 10: 205 1 11: 215 1 12: 230 1 13: 66 2 14: 52 1 15: 65 1 16: 97 1 17: 150 2 18: 91 1

how can I mutate in dplyr without losing order?

谁都会走 提交于 2020-01-11 08:38:30
问题 Using data.table I can do the following: library(data.table) dt = data.table(a = 1:2, b = c(1,2,NA,NA)) # a b #1: 1 1 #2: 2 2 #3: 1 NA #4: 2 NA dt[, b := b[1], by = a] # a b #1: 1 1 #2: 2 2 #3: 1 1 #4: 2 2 Attempting the same operation in dplyr however the data gets scrambled/sorted by a : library(dplyr) dt = data.table(a = 1:2, b = c(1,2,NA,NA)) dt %.% group_by(a) %.% mutate(b = b[1]) # a b #1 1 1 #2 1 1 #3 2 2 #4 2 2 (as an aside the above also sorts the original dt , which is somewhat

Values of the wrong group are used when using plot() within a data.table() in RStudio

南笙酒味 提交于 2020-01-11 08:23:33
问题 I want to generate a divided diagram. On the upper section of the diagram the values of group a , on the lower one the values of group b should be used. I am using data.table() to do this. Here is the code I used to generate an example and set up the graphical output: library(data.table) set.seed(23) Example <- data.table('group' = rep(c('a', 'b'), each = 5), 'value' = runif(10)) layout(1:2) par('mai' = rep(.5, 4)) When running the following lines in the usual r console the correct values are

Creating a data partition using caret and data.table

一笑奈何 提交于 2020-01-11 05:32:05
问题 I have a data.table in R which I want to use with caret package set.seed(42) trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE) head(trainingRows) # view the samples of row numbers However, I am not able to select the rows with data.table. Instead I had to convert to a data.frame DT_df <-as.data.frame(DT) DT_train<-DT_df[trainingRows,] dim(DT_train) the data.table alternative DT_train <- DT[.(trainingRows),] requires the keys to be set. Any better option other than converting

r - data.table and testthat package

谁说我不能喝 提交于 2020-01-11 04:54:29
问题 I am building a package which works with data.table and which should be tested using package testthat. While the code works fine when calling from the command line, I run into issues when calling from a test case. It seems that the [] function from the base package, i.e. the function for data.frames is used when running the tests. I have created a minimum example which can be found here: https://github.com/utalo/test_datatable_testthat The package contains a single function: test <- function(

Subset by multiple ranges [duplicate]

冷暖自知 提交于 2020-01-10 04:25:45
问题 This question already has answers here : Efficient way to filter one data frame by ranges in another (3 answers) Closed 2 years ago . I want to get a list of values that fall in between multiple ranges. library(data.table) values <- data.table(value = c(1:100)) range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92)) I need the results to include only the values that fall in between those ranges: results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92) I am

Subset by multiple ranges [duplicate]

ε祈祈猫儿з 提交于 2020-01-10 04:25:10
问题 This question already has answers here : Efficient way to filter one data frame by ranges in another (3 answers) Closed 2 years ago . I want to get a list of values that fall in between multiple ranges. library(data.table) values <- data.table(value = c(1:100)) range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92)) I need the results to include only the values that fall in between those ranges: results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92) I am

Why does median trip up data.table (integer versus double)?

最后都变了- 提交于 2020-01-09 09:10:08
问题 I have a data.table called enc.per.day for encounters per day. It has 2403 rows in which a date of service is specified and the number of patients seen on that day. I wanted to see the median number of patients seen on any type of weekday. enc.per.day[,list(patient.encounters=median(n)),by=list(weekdays(DOS))] That line gives an error Error in [.data.table (enc.per.day, , list(patient.encounters = median(n)), : columns of j don't evaluate to consistent types for each group: result for group 4