Loop over directory to get Excel content

问题

I have a directory with Excel files:

sites=list.files(pattern='[.]xls')
> sites
[1] "test1.xls" "test2.xls" "test3.xls"

This works:

a=read.xlsx(sites[1],14)

So I would expect that this would work too:

df=data.frame()
  for (i in sites){
  x=read.xlsx(sites[i],14)
  x=x[560:831,12:14]
  df=rbind.fill(df,x)
  }

However, that gives:

Error in loadWorkbook(file) : Cannot find NA

What is going wrong here? Also, is there a way to vectorise this - the files are large and loading is slow; I can't use read.xlsx2 since the data are not in the right [tabular] format.

回答1:

Your i iterates through the elements of sites and not the index. Try for(i in 1:length(sites)) instead. Or x=read.xlsx(i,14).

回答2:

You can try using ldply from the plyr package.

I'm defining a function first because you want to take just a part of each file. If you were taking all of it, you could just use read.xlsx in the ldply call.

library(xlsx)
library(plyr)
sites=list.files(pattern='[.]xls')

fun <- function(x) {
  df <- read.xlsx(x, sheetIndex=14)
  df <- df[560:831,12:14]
}

Then use fun in ldply:

df.big <- ldply(sites, fun)

Which should give you a dataframe with all of your sheets combined.

来源：https://stackoverflow.com/questions/15130048/loop-over-directory-to-get-excel-content

标签

loops

XLSX

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!