data.table

Column referencing: [[i]] vs [,i] for matrix, dataframe, and data.table

倾然丶 夕夏残阳落幕 提交于 2020-01-13 06:32:00
问题 Could someone please explain to me the difference in column referencing between matrix , data.frame , and data.table ? I'm getting my head around which syntax to use for each class, but I don't understand how/why they're different. Take a 10x10 matrix foo <- matrix( nrow = 10, ncol = 10 ) I'll just fill the 2nd column to demonstrate: foo[,2] <- rnorm(10) head( foo, 3 ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA -0.4688874 NA NA NA NA NA NA NA NA [2,] NA -1.0273370 NA NA NA NA

Applying calculation per groups within R dataframe

て烟熏妆下的殇ゞ 提交于 2020-01-13 05:07:49
问题 I have data like that: object category country 495647 1 RUS 477462 2 GER 431567 3 USA 449136 1 RUS 367260 1 USA 495649 1 RUS 477461 2 GER 431562 3 USA 449133 2 RUS 367264 2 USA ... where one object appears in various (category, country) pairs and countries share a single list of categories. I'd like to add another column to that, which would be a category weight per country - the number of objects appearing in a category for a category, normalized to sum up to 1 within a country (summation

Efficiently counting numbers falling within each range of numbers

最后都变了- 提交于 2020-01-13 03:44:47
问题 I'm looking for a faster solution to the problem below. I'll illustrate the problem with a small example and then provide the code to simulate a large data as that's the point of this question. My actual problem size is of list length = 1 million entries. Say, I've two lists as shown below: x <- list(c(82, 18), c(35, 50, 15)) y <- list(c(1,2,3,55,90), c(37,38,95)) Properties of x and y: Each element of the list x always sums up to 100. Each element of y will always be sorted and will be

Convert a list of sf objects into one sf

妖精的绣舞 提交于 2020-01-13 02:28:13
问题 I have a list of sf objects that I would like to row bind to create a single sf object. I'm looking for a function similar to data.table::rbindlist , that would stack the individual objects in an efficient manner. Data for reproducible example: my_list <- structure(list(structure(list(idhex = 4L, geometry = structure(list( structure(c(664106.970004623, 6524137.38910266), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(xmin = 664106.970004623

Conditional NA filling by group

蓝咒 提交于 2020-01-12 14:32:23
问题 edit The question was originally asked for data.table . A solution with any package would be interesting. I am a little stuck with a particular variation of a more general problem. I have panel data that I am using with data.table and I would like to fill in some missing values using the group by functionality of data.table. Unfortunately they are not numeric, so I can't simply interpolate, but they should only be filled in based on a condition. Is it possible to perform a kind of conditional

Conditional NA filling by group

ε祈祈猫儿з 提交于 2020-01-12 14:30:09
问题 edit The question was originally asked for data.table . A solution with any package would be interesting. I am a little stuck with a particular variation of a more general problem. I have panel data that I am using with data.table and I would like to fill in some missing values using the group by functionality of data.table. Unfortunately they are not numeric, so I can't simply interpolate, but they should only be filled in based on a condition. Is it possible to perform a kind of conditional

using eval in data.table

冷暖自知 提交于 2020-01-12 09:56:06
问题 I'm trying to understand the behaviour of eval in a data.table as a "frame". With following data.table: set.seed(1) foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T)) I'm trying to replicate this instruction foo[var1==1 , sum(var2) , by=var3] using a function of eval: eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() ) As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval

using eval in data.table

与世无争的帅哥 提交于 2020-01-12 09:54:34
问题 I'm trying to understand the behaviour of eval in a data.table as a "frame". With following data.table: set.seed(1) foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T)) I'm trying to replicate this instruction foo[var1==1 , sum(var2) , by=var3] using a function of eval: eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() ) As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval

multicore and data.table in R

人走茶凉 提交于 2020-01-12 05:33:07
问题 I am attempting to use multicore function parallel with data.table and am unable to quite come up with the right way to do this. Code: require(multicore) require(data.table) dtb = data.table(a=1:10, b=1:2) x = dtb[,parallel(a+1),by=b] > x b pid fd 1: 1 12243 3 2: 1 12243 6 3: 2 12247 4 4: 2 12247 8 I would like to call collect() on this but these are no longer parallel objects. How should one do this? 回答1: I think this is along the lines of what you want: collect(dtb[, list(jobs = list

Using `on` and `by` to compute a new variable from two data.tables

血红的双手。 提交于 2020-01-11 12:40:24
问题 How come I cannot use by when computing a new variable by from two data.tables following a merge? Example datasets: library(data.table) set.seed(1) # Example datasets. dt1 <- data.table(id=1:10, var=rnorm(10)) dt2 <- data.table(id=c(2, 4, 5, 6, 8), color=sample(1:2, 5, replace=TRUE), group=sample(c("a", "b"), 5, replace=TRUE)) # Join on ID. dt1[dt2, on="id"] # id var newVar color group # 1: 2 0.1836433 0.3672866 2 a # 2: 4 1.5952808 1.5952808 1 a # 3: 5 0.3295078 0.6590155 2 a # 4: 6 -0