ff

delete rows ff package

走远了吗. 提交于 2019-12-03 14:41:39
问题 Since a while now I´ve been using ff package in order to work with big data. The R object I´ve worked with has about 130.000.000 rows and 14 columns. Two of those columns, Temperature and Precipitation have missing values “NA” so I need to delete those rows in order to move forward with my work. I´ve been trying to do it like I would in a normal R object: data<-data[!is.na(data$temp),] But I keep getting an error: Error: vmode(index) == "integer" is not TRUE Does anyone have been able to

delete rows ff package

狂风中的少年 提交于 2019-12-03 03:46:13
Since a while now I´ve been using ff package in order to work with big data. The R object I´ve worked with has about 130.000.000 rows and 14 columns. Two of those columns, Temperature and Precipitation have missing values “NA” so I need to delete those rows in order to move forward with my work. I´ve been trying to do it like I would in a normal R object: data<-data[!is.na(data$temp),] But I keep getting an error: Error: vmode(index) == "integer" is not TRUE Does anyone have been able to delete rows in a ffdf object? I´d appreciate any help. Indexing based on a logical ff_vector is not

ff package write error

爷,独闯天下 提交于 2019-12-02 06:07:56
问题 I'm trying to work with a 1909x139352 dataset using R. Since my computer only has 2GB of RAM, the dataset turns out to be too big (500MB) for the conventional methods. So I decided to use the ff package. However, I've been having some troubles. The function read.table.ffdf is unable to read the first chunk of data. It crashes with the next error: txtdata <- read.table.ffdf(file="/directory/myfile.csv", FUN="read.table", header=FALSE, sep=",", colClasses=c("factor",rep("integer",139351)),

Still struggling with handling large data set

你说的曾经没有我的故事 提交于 2019-12-02 02:13:56
问题 I have been reading around on this website and haven't been able to find the exact answer. If it already exists, I apologize for the repost. I am working with data sets that are extremely large (600 million rows, 64 columns on a computer with 32 GB of RAM). I really only need much smaller subsets of this data, but am struggling to perform any functions besides simply importing one data set in with fread, and selecting the 5 columns I need. After that, I try to overwrite my dataset with the

ff package write error

不羁岁月 提交于 2019-12-01 23:22:14
I'm trying to work with a 1909x139352 dataset using R. Since my computer only has 2GB of RAM, the dataset turns out to be too big (500MB) for the conventional methods. So I decided to use the ff package. However, I've been having some troubles. The function read.table.ffdf is unable to read the first chunk of data. It crashes with the next error: txtdata <- read.table.ffdf(file="/directory/myfile.csv", FUN="read.table", header=FALSE, sep=",", colClasses=c("factor",rep("integer",139351)), first.rows=100, next.rows=100, VERBOSE=TRUE) read.table.ffdf 1..100 (100) csv-read=77.253sec Error en ff

What is the meaning of this error “Error in if (any(B < 1)) stop(”B too small“)” while using tabplot package

本秂侑毒 提交于 2019-12-01 17:19:15
I found the tabplot package for visualizin a large data base. I ran it using the code below but I get this error on different data frames: "Error in if (any(B < 1)) stop("B too small") : missing value where TRUE/FALSE needed In addition: Warning message: In bbatch(n, as.integer(BATCHBYTES/theobytes)) : NAs introduced by coercion" Here is an example: dat <- read.table(text = " birds wolfs snakes 3 9 7 3 8 4 1 2 8 1 2 3 1 8 3 6 1 2 6 7 1 6 1 5 5 9 7 3 8 7 4 2 7 1 2 3 7 6 3 6 1 1 6 3 9 6 1 1 ",header = TRUE) install.packages("tabplot") package ‘ff’ successfully unpacked and MD5 sums checked

R could not allocate memory on ff procedure. How come?

痞子三分冷 提交于 2019-12-01 11:23:11
I'm working on a 64-bit Windows Server 2008 machine with Intel Xeon processor and 24 GB of RAM. I'm having trouble trying to read a particular TSV (tab-delimited) file of 11 GB (>24 million rows, 20 columns). My usual companion, read.table , has failed me. I'm currently trying the package ff , through this procedure: > df <- read.delim.ffdf(file = "data.tsv", + header = TRUE, + VERBOSE = TRUE, + first.rows = 1e3, + next.rows = 1e6, + na.strings = c("", NA), + colClasses = c("NUMERO_PROCESSO" = "factor")) Which works fine for about 6 million records, but then I get an error, as you can see:

Functions for creating and reshaping big data in R using the FF package

那年仲夏 提交于 2019-12-01 11:01:30
I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head. I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would like to save the output as a csv file. With small datasets this is simple, and can be achieved using the

linux中的set ff=unix

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 02:26:46
set ff=unix : 告诉 vi 编辑器,使用unix换行符。 操作步骤: 1.用vi命令打开文件 2.直接输入   :set ff=unix 来源: https://www.cnblogs.com/lwcode6/p/11647955.html

Replace NAs in a ffdf object

馋奶兔 提交于 2019-11-30 14:23:55
问题 I`m working with a ffdf object which has NAs in some of the columns. The NAs are the result of a left outer merge using merge.ffdf .I would like to replace the NAs with 0s but not managing to do it. Here is the code I am running: library(ffbase) deals <- merge(deals,rk,by.x=c("DEALID","STICHTAG"),by.y=c("ID","STICHTAG"),all.x=TRUE) attributes(deals) $names [1] "virtual" "physical" "row.names" $class [1] "ffdf" vmode(deals$CREDIT_R) [1] "double" idx <- ffwhich(deals,is.na(CREDIT_R)) # CREDIT_R