data.table

Data Table Solution To New Structured Variable

不羁岁月 提交于 2020-03-05 01:32:36
问题 data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4), "score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1), "drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0), "WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3)) I have dataframe 'data' sans 'WANT' which is what I hope to create using a data.table solution. The rules are: if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3, if drop = 1, WANT=4 if score at t = 2 and score at t+1 = 1 that is ok but if score at t = 3 and score at any later scores are less than 3,

Data Table Solution To New Structured Variable

淺唱寂寞╮ 提交于 2020-03-05 01:32:24
问题 data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4), "score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1), "drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0), "WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3)) I have dataframe 'data' sans 'WANT' which is what I hope to create using a data.table solution. The rules are: if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3, if drop = 1, WANT=4 if score at t = 2 and score at t+1 = 1 that is ok but if score at t = 3 and score at any later scores are less than 3,

How to add a index by set of data when using rbindlist?

假装没事ソ 提交于 2020-03-03 13:05:49
问题 I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist() . files <- list.files( pattern = "*.csv" ); x2csv <- rbindlist( lapply(files, fread, stringsAsFactors=FALSE), fill = TRUE ) The code works weel. However, I would like to add a column filled with numbers to indicate from which csv file that observation came from. For exemple, the output should be: V1 V2 V3 C1 1: 0 0.2859163 0.55848521 1 2: 1 1

Using shift and data table to update the value of an inventory

蹲街弑〆低调 提交于 2020-03-01 05:14:50
问题 I want to use data tables to compute the running value of an inventory. For example, this following ledger: ledger <- data.table(Date = c('2017-04-05','2017-06-12','2017-08-12','2017-10-27','2017-11-01'), Op = c('Purchase','Sale','Purchase','Purchase','Sale'), Prod = c('ProdA','ProdA','ProdA','ProdA','ProdA'), Qty = c(27,-20,15,10,-22), Prc = c(36.47,41.64,40.03,40.95,40.82)) I want to compute the running stock of that product and the running average value. Stock is easy: ledger[,Stock :=

Using shift and data table to update the value of an inventory

孤人 提交于 2020-03-01 05:14:30
问题 I want to use data tables to compute the running value of an inventory. For example, this following ledger: ledger <- data.table(Date = c('2017-04-05','2017-06-12','2017-08-12','2017-10-27','2017-11-01'), Op = c('Purchase','Sale','Purchase','Purchase','Sale'), Prod = c('ProdA','ProdA','ProdA','ProdA','ProdA'), Qty = c(27,-20,15,10,-22), Prc = c(36.47,41.64,40.03,40.95,40.82)) I want to compute the running stock of that product and the running average value. Stock is easy: ledger[,Stock :=

How to pass “everything possible” to by in a function?

大兔子大兔子 提交于 2020-02-24 01:13:22
问题 I am trying to use data.table within a user facing function in a package I'm working on. I would like this function to behave as data.table -like as possible. This means for example that my function also features a by argument, which is passed to the underlying data.table call within the function. The user should be free to pass anything into "my" by which is possible directly in a data.table . Citing from ?data.table this includes: A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x

Pass variable name as argument inside data.table

夙愿已清 提交于 2020-02-21 10:16:05
问题 I'm trying to create a function that modifies a data.table and wanted to use some non-standard evaluation but I realised that I don't really know how to work with it inside data.tables. My function is basically something like this: do_stuff <- function(dt, col) { copy(dt)[, new_col := some_fun(col)][] } and I want to call it thus: do_stuff(data, column) Where "column" is the name of the column that exists inside "data". If I run that function I get an error: #> Error in some_fun(col) : object

Remove row based on two factor levels

依然范特西╮ 提交于 2020-02-05 09:12:33
问题 I had a problem that is very similar to this question, however my data is grouped by two levels. str(dt) 'data.frame': 202206 obs. of 4 variables: $ cros : int -205 -200 -195 -190 -185 -180 -175 -170 -165 -160 ... $ along: Factor w/ 113 levels "100","101","102",..: 1 1 1 1 1 1 1 1 1 1 ... $ alti : num 1.61 1.6 1.6 1.6 1.6 1.59 1.59 1.59 1.59 1.58 ... $ year : Factor w/ 6 levels "1979","1983",..: 1 1 1 1 1 1 1 1 1 1 ... head(dt) cros along alti year -205 100 1.61 1979 -200 100 1.60 1979 -195

Remove row based on two factor levels

狂风中的少年 提交于 2020-02-05 09:12:11
问题 I had a problem that is very similar to this question, however my data is grouped by two levels. str(dt) 'data.frame': 202206 obs. of 4 variables: $ cros : int -205 -200 -195 -190 -185 -180 -175 -170 -165 -160 ... $ along: Factor w/ 113 levels "100","101","102",..: 1 1 1 1 1 1 1 1 1 1 ... $ alti : num 1.61 1.6 1.6 1.6 1.6 1.59 1.59 1.59 1.59 1.58 ... $ year : Factor w/ 6 levels "1979","1983",..: 1 1 1 1 1 1 1 1 1 1 ... head(dt) cros along alti year -205 100 1.61 1979 -200 100 1.60 1979 -195

Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

醉酒当歌 提交于 2020-02-05 06:19:37
问题 I wrote some code to performed oversampling, meaning that I replicate my observations in a data.frame and add noise to the replicates, so they are not exactly the same anymore. I'm quite happy that it works now as intended, but...it is too slow. I'm just learning dplyr and have no clue about data.table, but I hope there is a way to improve my function. I'm running this code in a function for 100s of data.frames which may contain about 10,000 columns and 400 rows. This is some toy data: