Filtering multiple csv files while importing into data frame

前端 未结 3 1062
攒了一身酷
攒了一身酷 2021-01-03 11:18

I have a large number of csv files that I want to read into R. All the column headings in the csvs are the same. But I want to import only those rows from each file into the

3条回答
  •  臣服心动
    2021-01-03 11:30

    If you are really stuck for memory then the following solution might work. It uses LaF to read only the column needed for filtering; then calculates the total number of lines that will be read; initialized the complete data.frame and then read the required lines from the files. (It's probably not faster than the other solutions)

    library("LaF")
    
    colnames <- c("v1","v2","v3")
    colclasses <- c("character", "character", "numeric")
    
    fileNames <- list.files(pattern = "*.csv")
    
    # First determine which lines to read from each file and the total number of lines
    # to be read
    lines <- list()
    for (fn in fileNames) {
      laf <- laf_open_csv(fn, column_types=colclasses, column_names=colnames, skip=1)
      d   <- laf$v3[] 
      lines[[fn]] <- which(d > 2 & d < 7)
    }
    nlines <- sum(sapply(lines, length))
    
    # Initialize data.frame
    df <- as.data.frame(lapply(colclasses, do.call, list(nlines)), 
            stringsAsFactors=FALSE)
    names(df) <- colnames
    
    # Read the lines from the files
    i <- 0
    for (fn in names(lines)) {
      laf <- laf_open_csv(fn, column_types=colclasses, column_names=colnames, skip=1)
      n   <- length(lines[[fn]])
      df[seq_len(n) + i, ] <- laf[lines[[fn]], ]
      i   <- i + n
    }
    

提交回复
热议问题