问题
I have a directory with Excel files:
sites=list.files(pattern='[.]xls')
> sites
[1] "test1.xls" "test2.xls" "test3.xls"
This works:
a=read.xlsx(sites[1],14)
So I would expect that this would work too:
df=data.frame()
for (i in sites){
x=read.xlsx(sites[i],14)
x=x[560:831,12:14]
df=rbind.fill(df,x)
}
However, that gives:
Error in loadWorkbook(file) : Cannot find NA
What is going wrong here? Also, is there a way to vectorise this - the files are large and loading is slow; I can't use read.xlsx2 since the data are not in the right [tabular] format.
回答1:
Your i iterates through the elements of sites and not the index. Try for(i in 1:length(sites)) instead. Or x=read.xlsx(i,14).
回答2:
You can try using ldply from the plyr package.
I'm defining a function first because you want to take just a part of each file. If you were taking all of it, you could just use read.xlsx in the ldply call.
library(xlsx)
library(plyr)
sites=list.files(pattern='[.]xls')
fun <- function(x) {
df <- read.xlsx(x, sheetIndex=14)
df <- df[560:831,12:14]
}
Then use fun in ldply:
df.big <- ldply(sites, fun)
Which should give you a dataframe with all of your sheets combined.
来源:https://stackoverflow.com/questions/15130048/loop-over-directory-to-get-excel-content