问题
I have a directory with Excel files:
sites=list.files(pattern='[.]xls')
> sites
[1] "test1.xls" "test2.xls" "test3.xls"
This works:
a=read.xlsx(sites[1],14)
So I would expect that this would work too:
df=data.frame()
for (i in sites){
x=read.xlsx(sites[i],14)
x=x[560:831,12:14]
df=rbind.fill(df,x)
}
However, that gives:
Error in loadWorkbook(file) : Cannot find NA
What is going wrong here? Also, is there a way to vectorise this - the files are large and loading is slow; I can't use read.xlsx2 since the data are not in the right [tabular] format.
回答1:
Your i
iterates through the elements of sites
and not the index. Try for(i in 1:length(sites))
instead. Or x=read.xlsx(i,14)
.
回答2:
You can try using ldply
from the plyr
package.
I'm defining a function first because you want to take just a part of each file. If you were taking all of it, you could just use read.xlsx
in the ldply
call.
library(xlsx)
library(plyr)
sites=list.files(pattern='[.]xls')
fun <- function(x) {
df <- read.xlsx(x, sheetIndex=14)
df <- df[560:831,12:14]
}
Then use fun
in ldply
:
df.big <- ldply(sites, fun)
Which should give you a dataframe with all of your sheets combined.
来源:https://stackoverflow.com/questions/15130048/loop-over-directory-to-get-excel-content