data.table | 易学教程

Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

阅读更多关于 Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

问题 I wrote some code to performed oversampling, meaning that I replicate my observations in a data.frame and add noise to the replicates, so they are not exactly the same anymore. I'm quite happy that it works now as intended, but...it is too slow. I'm just learning dplyr and have no clue about data.table, but I hope there is a way to improve my function. I'm running this code in a function for 100s of data.frames which may contain about 10,000 columns and 400 rows. This is some toy data:

fread() error and strange behaviour when reading csv

阅读更多关于 fread() error and strange behaviour when reading csv

问题 I used fread() from data.table library to try read a 540MB csv file. It returned an error message saying: ' ends field 36 on line 4 when detecting types: 20.00,8/25/2006 0:00:00,"07:05:00 PM","CST",143.00,"OTTAWA","KS","HAIL",1.00,"S","MINNEAPOLIS",8/25/2006 0:00:00,"07:05:00 PM",0.00,,1.00,"S","MINNEAPOLIS",0.00,0.00,,88.00,0.00,0.00,0.00,,0.00,,"TOP","KANSAS, East",,3907.00,9743.00,3907.00,9743.00,"Dime to nickel sized hail. I have no idea what caused the error and want to track down if it

decimal point setting in fread, data.table

阅读更多关于 decimal point setting in fread, data.table

问题 I would like to use fread from data.table, but get a warning related to the decimal point [here a ',' instead of a '.']. Normally I use '.', but in some cases the file I have to import files with ',' as decimal point. In read.csv I can set the decimal point separator: df <- read.csv("mydata.csv", sep=";", dec=",") How can I do this in the fread function in data.table? with df=fread('mydata.csv',sep=';') I get a warning message: Warning message: In fread("mydata.csv", : Bumped column 7 to type

R: Multiplying columns by a constant in a data table

阅读更多关于 R: Multiplying columns by a constant in a data table

问题 I am trying to correct my data table so my columns have the same units. Here's an example of what I have. hh:mm A V W kA V kW A kV kW 11:00 13.84 470.16 6509.88 14.89 467.85 6964.38 15.74 464.01 7303.13 11:05 12.54 475.17 5959.22 13.40 474.52 6358.89 13.34 473.13 6311.80 11:10 9.73 476.20 4632.14 10.36 473.38 4905.86 10.38 472.73 4907.14 11:15 9.20 479.30 4410.89 9.65 482.79 4659.67 9.73 479.09 4659.33 11:20 11.28 482.22 5437.78 12.03 484.95 5835.33 12.24 476.36 5829.44 11:25 11.66 481.64

R: Multiplying columns by a constant in a data table

阅读更多关于 R: Multiplying columns by a constant in a data table

Why does “..” work to pass column names in a character vector variable?

阅读更多关于 Why does “..” work to pass column names in a character vector variable?

问题 The following code does work but I cannot find any documentation about the " .. " (dot dot) operator in the data.table help and vignette: library(data.table) cols <- c("mpg", "gear") DT <- as.data.table(mtcars) DT[ , ..cols] The output is: mpg gear 1: 21.0 4 2: 21.0 4 3: 22.8 4 4: 21.4 3 5: 18.7 3 ... Why does this work, is there any documentation for that? PS: Normally I would use mget etc... Edit 1: This is not a plain R feature of the reserved names ... , ..1 , ..2 etc., which are used to

Want to remove duplicated rows unless NA value exists in columns

阅读更多关于 Want to remove duplicated rows unless NA value exists in columns

问题 I have a data table with 4 columns: ID, Name, Rate1, Rate2. I want to remove duplicates where ID, Rate1, and Rate 2 are the same, but if they are both NA, I would like to keep both rows. Basically, I want to conditionally remove duplicates, but only if the conditions != NA. For example, I would like this: ID Name Rate1 Rate2 1 Xyz 1 2 1 Abc 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 to become this: ID Name Rate1 Rate2 1 Xyz 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 Thanks in

Want to remove duplicated rows unless NA value exists in columns

阅读更多关于 Want to remove duplicated rows unless NA value exists in columns

data.table conditional Inequality join

阅读更多关于 data.table conditional Inequality join

问题 There're two sample datasets: > aDT col1 col2 ExtractDate 1: 1 A 2017-01-01 2: 1 A 2016-01-01 3: 2 B 2015-01-01 4: 2 B 2014-01-01 > bDT col1 col2 date_pol Value 1: 1 A 2017-05-20 1 2: 1 A 2016-05-20 2 3: 1 A 2015-05-20 3 4: 2 B 2014-05-20 4 And I need: > cDT col1 col2 ExtractDate date_pol Value 1: 1 A 2017-01-01 2016-05-20 2 2: 1 A 2016-01-01 2015-05-20 3 3: 2 B 2015-01-01 2014-05-20 4 4: 2 B 2014-01-01 NA NA Basically, aDT left join bDT based on col1, col2 and ExtractDate >= date_pol, only

Convert data.table to data.frame (i.e. undo setDT)

阅读更多关于 Convert data.table to data.frame (i.e. undo setDT)

问题 I have a 20x2 dataframe. I converted that dataframe to a data.table to perform some operations (deleted the explanation of the operations and goal as out of scope). The conversion allowed me to avoid using a for loop. But the conversion generates some issues down the line. I need to convert the df data.table back into a data.frame. How can I do that? Thanks very much for your help. df <- data.frame(LastPrice = c( 1221, 1220, 1220, 1217, 1216, 1218 , 1216, 1216, 1217, 1220, 1219, 1218, 1220,