ffbase

ff package write error

不羁岁月 提交于 2019-12-01 23:22:14
I'm trying to work with a 1909x139352 dataset using R. Since my computer only has 2GB of RAM, the dataset turns out to be too big (500MB) for the conventional methods. So I decided to use the ff package. However, I've been having some troubles. The function read.table.ffdf is unable to read the first chunk of data. It crashes with the next error: txtdata <- read.table.ffdf(file="/directory/myfile.csv", FUN="read.table", header=FALSE, sep=",", colClasses=c("factor",rep("integer",139351)), first.rows=100, next.rows=100, VERBOSE=TRUE) read.table.ffdf 1..100 (100) csv-read=77.253sec Error en ff

Functions for creating and reshaping big data in R using the FF package

那年仲夏 提交于 2019-12-01 11:01:30
I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head. I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would like to save the output as a csv file. With small datasets this is simple, and can be achieved using the

Replace NAs in a ffdf object

馋奶兔 提交于 2019-11-30 14:23:55
问题 I`m working with a ffdf object which has NAs in some of the columns. The NAs are the result of a left outer merge using merge.ffdf .I would like to replace the NAs with 0s but not managing to do it. Here is the code I am running: library(ffbase) deals <- merge(deals,rk,by.x=c("DEALID","STICHTAG"),by.y=c("ID","STICHTAG"),all.x=TRUE) attributes(deals) $names [1] "virtual" "physical" "row.names" $class [1] "ffdf" vmode(deals$CREDIT_R) [1] "double" idx <- ffwhich(deals,is.na(CREDIT_R)) # CREDIT_R

aggregation using ffdfdply function in R

*爱你&永不变心* 提交于 2019-11-28 11:49:54
问题 I tried aggregation on large dataset using 'ffbase' package using ffdfdply function in R. lets say I have three variables called Date,Item and sales. Here I want to aggregate the sales over Date and Item using sum function. Could you please guide me through some proper syntax in R. Here I tried like this: grp_qty <- ffdfdply(x=data[c("sales","Date","Item")], split=as.character(data$sales),FUN = function(data) summaryBy(Date+Item~sales, data=data, FUN=sum)). I would appreciate for your