Counting the number of rows of a series of csv files

本小妞迷上赌 提交于 2019-12-09 18:16:40

问题


I'm working through an R tutorial and suspect that I have to use one of these functions but I'm not sure which (Yes I researched them but until I become more fluent in R terminology they are quite confusing).

In my working directory there is a folder "specdata". Specdata contains hundreds of CSV files named 001.csv - 300.csv.

The function I am working on must count the total number of rows for an inputed number of csv files. So if the argument in the function is 1:10 and each of those files has ten rows, return 100.

Here's what I have so far:

complete <- function(directory,id = 1:332) {
    setpath <- paste("/Users/gcameron/Desktop",directory,sep="/")
    setwd(setpath)
    csvfile <- sprintf("%03d.csv", id)
    file <- read.csv(csvfile)
    nrow(file)
 }

This works when the ID argument is one number, say 17. But, if I input say 10:50 as an argument, I receive an error:

Error in file(file, "rt") : invalid 'description' argument

What should I do to be able to count the total number of rows from the inputed ID parameter?


回答1:


read.csv expects to read just one file, so you need to loop over files, a R idiomatic way of doing so is to use sapply:

nrows <- sapply( csvfile, function(f) nrow(read.csv(f)) )
sum(nrows)

For example, here is a rewrite of your complete function:

complete <- function(directory,id = 1:332) {
    csvfiles <- sprintf("/Users/gcameron/Desktop/%s/%03d.csv", directory, id)
    nrows <- sapply( csvfiles, function(f) nrow(read.csv(f)) )
    sum(nrows)
}



回答2:


Homework problems usually get tagged as such, though I don't know if that is required, but this clearly is homework.

Your function as written expects that id is not a vector (despite the default value being a vector of integers).

Change it to either use one of the *apply functions (more concise and common), or even an explicit loop. For each element in the id vector, you must call a function that opens that file and counts the observations.

This stackoverflow post has a good explanation of the differences between the *apply functions.



来源:https://stackoverflow.com/questions/14358629/counting-the-number-of-rows-of-a-series-of-csv-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!