I have a large number of csv files that I want to read into R. All the column headings in the csvs are the same. But I want to import only those rows from each file into the
If you are really stuck for memory then the following solution might work. It uses LaF to read only the column needed for filtering; then calculates the total number of lines that will be read; initialized the complete data.frame and then read the required lines from the files. (It's probably not faster than the other solutions)
library("LaF")
colnames <- c("v1","v2","v3")
colclasses <- c("character", "character", "numeric")
fileNames <- list.files(pattern = "*.csv")
# First determine which lines to read from each file and the total number of lines
# to be read
lines <- list()
for (fn in fileNames) {
laf <- laf_open_csv(fn, column_types=colclasses, column_names=colnames, skip=1)
d <- laf$v3[]
lines[[fn]] <- which(d > 2 & d < 7)
}
nlines <- sum(sapply(lines, length))
# Initialize data.frame
df <- as.data.frame(lapply(colclasses, do.call, list(nlines)),
stringsAsFactors=FALSE)
names(df) <- colnames
# Read the lines from the files
i <- 0
for (fn in names(lines)) {
laf <- laf_open_csv(fn, column_types=colclasses, column_names=colnames, skip=1)
n <- length(lines[[fn]])
df[seq_len(n) + i, ] <- laf[lines[[fn]], ]
i <- i + n
}