data.table | 易学教程

Column referencing: [[i]] vs [,i] for matrix, dataframe, and data.table

阅读更多关于 Column referencing: [[i]] vs [,i] for matrix, dataframe, and data.table

问题 Could someone please explain to me the difference in column referencing between matrix , data.frame , and data.table ? I'm getting my head around which syntax to use for each class, but I don't understand how/why they're different. Take a 10x10 matrix foo <- matrix( nrow = 10, ncol = 10 ) I'll just fill the 2nd column to demonstrate: foo[,2] <- rnorm(10) head( foo, 3 ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA -0.4688874 NA NA NA NA NA NA NA NA [2,] NA -1.0273370 NA NA NA NA

Applying calculation per groups within R dataframe

阅读更多关于 Applying calculation per groups within R dataframe

问题 I have data like that: object category country 495647 1 RUS 477462 2 GER 431567 3 USA 449136 1 RUS 367260 1 USA 495649 1 RUS 477461 2 GER 431562 3 USA 449133 2 RUS 367264 2 USA ... where one object appears in various (category, country) pairs and countries share a single list of categories. I'd like to add another column to that, which would be a category weight per country - the number of objects appearing in a category for a category, normalized to sum up to 1 within a country (summation

Efficiently counting numbers falling within each range of numbers

阅读更多关于 Efficiently counting numbers falling within each range of numbers

问题 I'm looking for a faster solution to the problem below. I'll illustrate the problem with a small example and then provide the code to simulate a large data as that's the point of this question. My actual problem size is of list length = 1 million entries. Say, I've two lists as shown below: x <- list(c(82, 18), c(35, 50, 15)) y <- list(c(1,2,3,55,90), c(37,38,95)) Properties of x and y: Each element of the list x always sums up to 100. Each element of y will always be sorted and will be

Convert a list of sf objects into one sf

阅读更多关于 Convert a list of sf objects into one sf

问题 I have a list of sf objects that I would like to row bind to create a single sf object. I'm looking for a function similar to data.table::rbindlist , that would stack the individual objects in an efficient manner. Data for reproducible example: my_list <- structure(list(structure(list(idhex = 4L, geometry = structure(list( structure(c(664106.970004623, 6524137.38910266), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(xmin = 664106.970004623

Conditional NA filling by group

阅读更多关于 Conditional NA filling by group

问题 edit The question was originally asked for data.table . A solution with any package would be interesting. I am a little stuck with a particular variation of a more general problem. I have panel data that I am using with data.table and I would like to fill in some missing values using the group by functionality of data.table. Unfortunately they are not numeric, so I can't simply interpolate, but they should only be filled in based on a condition. Is it possible to perform a kind of conditional

Conditional NA filling by group

阅读更多关于 Conditional NA filling by group

using eval in data.table

阅读更多关于 using eval in data.table

问题 I'm trying to understand the behaviour of eval in a data.table as a "frame". With following data.table: set.seed(1) foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T)) I'm trying to replicate this instruction foo[var1==1 , sum(var2) , by=var3] using a function of eval: eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() ) As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval

using eval in data.table

阅读更多关于 using eval in data.table

multicore and data.table in R

阅读更多关于 multicore and data.table in R

问题 I am attempting to use multicore function parallel with data.table and am unable to quite come up with the right way to do this. Code: require(multicore) require(data.table) dtb = data.table(a=1:10, b=1:2) x = dtb[,parallel(a+1),by=b] > x b pid fd 1: 1 12243 3 2: 1 12243 6 3: 2 12247 4 4: 2 12247 8 I would like to call collect() on this but these are no longer parallel objects. How should one do this? 回答1: I think this is along the lines of what you want: collect(dtb[, list(jobs = list

Using `on` and `by` to compute a new variable from two data.tables

阅读更多关于 Using `on` and `by` to compute a new variable from two data.tables

问题 How come I cannot use by when computing a new variable by from two data.tables following a merge? Example datasets: library(data.table) set.seed(1) # Example datasets. dt1 <- data.table(id=1:10, var=rnorm(10)) dt2 <- data.table(id=c(2, 4, 5, 6, 8), color=sample(1:2, 5, replace=TRUE), group=sample(c("a", "b"), 5, replace=TRUE)) # Join on ID. dt1[dt2, on="id"] # id var newVar color group # 1: 2 0.1836433 0.3672866 2 a # 2: 4 1.5952808 1.5952808 1 a # 3: 5 0.3295078 0.6590155 2 a # 4: 6 -0