data.table | 易学教程

Data Table Solution To New Structured Variable

阅读更多关于 Data Table Solution To New Structured Variable

问题 data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,4,4,4,4), "score"=c(1,2,1,1,2,3,2,NA,3,NA,1,3,2,1), "drop"=c(0,0,0,0,0,0,0,1,0,1,0,0,0,0), "WANT"=c(1,2,1,1,2,3,3,4,3,4,1,3,3,3)) I have dataframe 'data' sans 'WANT' which is what I hope to create using a data.table solution. The rules are: if score = 1, WANT = 1 if score = 2, WANT = 2 if score = 3, WANT = 3, if drop = 1, WANT=4 if score at t = 2 and score at t+1 = 1 that is ok but if score at t = 3 and score at any later scores are less than 3,

Data Table Solution To New Structured Variable

阅读更多关于 Data Table Solution To New Structured Variable

How to add a index by set of data when using rbindlist?

阅读更多关于 How to add a index by set of data when using rbindlist?

问题 I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist() . files <- list.files( pattern = "*.csv" ); x2csv <- rbindlist( lapply(files, fread, stringsAsFactors=FALSE), fill = TRUE ) The code works weel. However, I would like to add a column filled with numbers to indicate from which csv file that observation came from. For exemple, the output should be: V1 V2 V3 C1 1: 0 0.2859163 0.55848521 1 2: 1 1

Using shift and data table to update the value of an inventory

阅读更多关于 Using shift and data table to update the value of an inventory

问题 I want to use data tables to compute the running value of an inventory. For example, this following ledger: ledger <- data.table(Date = c('2017-04-05','2017-06-12','2017-08-12','2017-10-27','2017-11-01'), Op = c('Purchase','Sale','Purchase','Purchase','Sale'), Prod = c('ProdA','ProdA','ProdA','ProdA','ProdA'), Qty = c(27,-20,15,10,-22), Prc = c(36.47,41.64,40.03,40.95,40.82)) I want to compute the running stock of that product and the running average value. Stock is easy: ledger[,Stock :=

Using shift and data table to update the value of an inventory

阅读更多关于 Using shift and data table to update the value of an inventory

How to pass “everything possible” to by in a function?

阅读更多关于 How to pass “everything possible” to by in a function?

问题 I am trying to use data.table within a user facing function in a package I'm working on. I would like this function to behave as data.table -like as possible. This means for example that my function also features a by argument, which is passed to the underlying data.table call within the function. The user should be free to pass anything into "my" by which is possible directly in a data.table . Citing from ?data.table this includes: A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x

Pass variable name as argument inside data.table

阅读更多关于 Pass variable name as argument inside data.table

问题 I'm trying to create a function that modifies a data.table and wanted to use some non-standard evaluation but I realised that I don't really know how to work with it inside data.tables. My function is basically something like this: do_stuff <- function(dt, col) { copy(dt)[, new_col := some_fun(col)][] } and I want to call it thus: do_stuff(data, column) Where "column" is the name of the column that exists inside "data". If I run that function I get an error: #> Error in some_fun(col) : object

Remove row based on two factor levels

阅读更多关于 Remove row based on two factor levels

问题 I had a problem that is very similar to this question, however my data is grouped by two levels. str(dt) 'data.frame': 202206 obs. of 4 variables: $ cros : int -205 -200 -195 -190 -185 -180 -175 -170 -165 -160 ... $ along: Factor w/ 113 levels "100","101","102",..: 1 1 1 1 1 1 1 1 1 1 ... $ alti : num 1.61 1.6 1.6 1.6 1.6 1.59 1.59 1.59 1.59 1.58 ... $ year : Factor w/ 6 levels "1979","1983",..: 1 1 1 1 1 1 1 1 1 1 ... head(dt) cros along alti year -205 100 1.61 1979 -200 100 1.60 1979 -195

Remove row based on two factor levels

阅读更多关于 Remove row based on two factor levels

Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

阅读更多关于 Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

问题 I wrote some code to performed oversampling, meaning that I replicate my observations in a data.frame and add noise to the replicates, so they are not exactly the same anymore. I'm quite happy that it works now as intended, but...it is too slow. I'm just learning dplyr and have no clue about data.table, but I hope there is a way to improve my function. I'm running this code in a function for 100s of data.frames which may contain about 10,000 columns and 400 rows. This is some toy data: