data.table

Speeding up dplyr pipe including checks with mutate_if and if_else on larger tables

天涯浪子 提交于 2020-02-05 06:18:32
问题 I wrote some code to performed oversampling, meaning that I replicate my observations in a data.frame and add noise to the replicates, so they are not exactly the same anymore. I'm quite happy that it works now as intended, but...it is too slow. I'm just learning dplyr and have no clue about data.table, but I hope there is a way to improve my function. I'm running this code in a function for 100s of data.frames which may contain about 10,000 columns and 400 rows. This is some toy data:

fread() error and strange behaviour when reading csv

白昼怎懂夜的黑 提交于 2020-02-03 09:51:29
问题 I used fread() from data.table library to try read a 540MB csv file. It returned an error message saying: ' ends field 36 on line 4 when detecting types: 20.00,8/25/2006 0:00:00,"07:05:00 PM","CST",143.00,"OTTAWA","KS","HAIL",1.00,"S","MINNEAPOLIS",8/25/2006 0:00:00,"07:05:00 PM",0.00,,1.00,"S","MINNEAPOLIS",0.00,0.00,,88.00,0.00,0.00,0.00,,0.00,,"TOP","KANSAS, East",,3907.00,9743.00,3907.00,9743.00,"Dime to nickel sized hail. I have no idea what caused the error and want to track down if it

decimal point setting in fread, data.table

我与影子孤独终老i 提交于 2020-02-03 05:23:40
问题 I would like to use fread from data.table, but get a warning related to the decimal point [here a ',' instead of a '.']. Normally I use '.', but in some cases the file I have to import files with ',' as decimal point. In read.csv I can set the decimal point separator: df <- read.csv("mydata.csv", sep=";", dec=",") How can I do this in the fread function in data.table? with df=fread('mydata.csv',sep=';') I get a warning message: Warning message: In fread("mydata.csv", : Bumped column 7 to type

R: Multiplying columns by a constant in a data table

半城伤御伤魂 提交于 2020-02-02 06:48:06
问题 I am trying to correct my data table so my columns have the same units. Here's an example of what I have. hh:mm A V W kA V kW A kV kW 11:00 13.84 470.16 6509.88 14.89 467.85 6964.38 15.74 464.01 7303.13 11:05 12.54 475.17 5959.22 13.40 474.52 6358.89 13.34 473.13 6311.80 11:10 9.73 476.20 4632.14 10.36 473.38 4905.86 10.38 472.73 4907.14 11:15 9.20 479.30 4410.89 9.65 482.79 4659.67 9.73 479.09 4659.33 11:20 11.28 482.22 5437.78 12.03 484.95 5835.33 12.24 476.36 5829.44 11:25 11.66 481.64

R: Multiplying columns by a constant in a data table

青春壹個敷衍的年華 提交于 2020-02-02 06:47:33
问题 I am trying to correct my data table so my columns have the same units. Here's an example of what I have. hh:mm A V W kA V kW A kV kW 11:00 13.84 470.16 6509.88 14.89 467.85 6964.38 15.74 464.01 7303.13 11:05 12.54 475.17 5959.22 13.40 474.52 6358.89 13.34 473.13 6311.80 11:10 9.73 476.20 4632.14 10.36 473.38 4905.86 10.38 472.73 4907.14 11:15 9.20 479.30 4410.89 9.65 482.79 4659.67 9.73 479.09 4659.33 11:20 11.28 482.22 5437.78 12.03 484.95 5835.33 12.24 476.36 5829.44 11:25 11.66 481.64

Why does “..” work to pass column names in a character vector variable?

╄→尐↘猪︶ㄣ 提交于 2020-01-31 07:07:24
问题 The following code does work but I cannot find any documentation about the " .. " (dot dot) operator in the data.table help and vignette: library(data.table) cols <- c("mpg", "gear") DT <- as.data.table(mtcars) DT[ , ..cols] The output is: mpg gear 1: 21.0 4 2: 21.0 4 3: 22.8 4 4: 21.4 3 5: 18.7 3 ... Why does this work, is there any documentation for that? PS: Normally I would use mget etc... Edit 1: This is not a plain R feature of the reserved names ... , ..1 , ..2 etc., which are used to

Want to remove duplicated rows unless NA value exists in columns

家住魔仙堡 提交于 2020-01-30 08:38:33
问题 I have a data table with 4 columns: ID, Name, Rate1, Rate2. I want to remove duplicates where ID, Rate1, and Rate 2 are the same, but if they are both NA, I would like to keep both rows. Basically, I want to conditionally remove duplicates, but only if the conditions != NA. For example, I would like this: ID Name Rate1 Rate2 1 Xyz 1 2 1 Abc 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 to become this: ID Name Rate1 Rate2 1 Xyz 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 Thanks in

Want to remove duplicated rows unless NA value exists in columns

血红的双手。 提交于 2020-01-30 08:38:06
问题 I have a data table with 4 columns: ID, Name, Rate1, Rate2. I want to remove duplicates where ID, Rate1, and Rate 2 are the same, but if they are both NA, I would like to keep both rows. Basically, I want to conditionally remove duplicates, but only if the conditions != NA. For example, I would like this: ID Name Rate1 Rate2 1 Xyz 1 2 1 Abc 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 to become this: ID Name Rate1 Rate2 1 Xyz 1 2 2 Def NA NA 2 Lmn NA NA 3 Hij 3 5 3 Qrs 3 7 Thanks in

data.table conditional Inequality join

谁说我不能喝 提交于 2020-01-30 06:00:26
问题 There're two sample datasets: > aDT col1 col2 ExtractDate 1: 1 A 2017-01-01 2: 1 A 2016-01-01 3: 2 B 2015-01-01 4: 2 B 2014-01-01 > bDT col1 col2 date_pol Value 1: 1 A 2017-05-20 1 2: 1 A 2016-05-20 2 3: 1 A 2015-05-20 3 4: 2 B 2014-05-20 4 And I need: > cDT col1 col2 ExtractDate date_pol Value 1: 1 A 2017-01-01 2016-05-20 2 2: 1 A 2016-01-01 2015-05-20 3 3: 2 B 2015-01-01 2014-05-20 4 4: 2 B 2014-01-01 NA NA Basically, aDT left join bDT based on col1, col2 and ExtractDate >= date_pol, only

Convert data.table to data.frame (i.e. undo setDT)

寵の児 提交于 2020-01-30 05:24:52
问题 I have a 20x2 dataframe. I converted that dataframe to a data.table to perform some operations (deleted the explanation of the operations and goal as out of scope). The conversion allowed me to avoid using a for loop. But the conversion generates some issues down the line. I need to convert the df data.table back into a data.frame. How can I do that? Thanks very much for your help. df <- data.frame(LastPrice = c( 1221, 1220, 1220, 1217, 1216, 1218 , 1216, 1216, 1217, 1220, 1219, 1218, 1220,