I have a list of multiple xml files which have the same structure. Some of them have structural errors in them so they can't be read, i'm not capable of controlling them manually because there are too many files. I know that i need to imply the try or trycatch functions, i tried to understand them but i'm not understanding how to use them proberly on my case. To make the example easy i just want to transform them all into a csv.
library(XML)
k <- 1
Initial.files<- list.files("/My/Initial/Folder")
for(i in initial.files){
data<-dataTable(xmlToDataFrame(xmlParse(i)))
write.csv(data, file = paste("data",(k)".csv"))
k <- k+1
}
The error i get usually looks like:
Start tag expected, '<' not found
Error in xmlToDataFrame(xmlParse(i)) :
error in evaluation the argument 'doc' in selecting a method for function 'xmlToDataFrame': Error 1: Start tag expected, '<' not found
To handle my problem i have to rewrite my 5th line of code(i know that it is wrong):
data<- if(try(dataTable(xmlToDataFrame(xmlParse(i)))!= "try-error")
else{ haven't looked close to this because i didn't got that far...}...
I would like it to read the files and give me a list of the files path which didn't work to be read.
The Structure of the xml files look like:
<ROWSET>
<ROW>
<line1>asdf</line1>
<line2>ghjk</line2>
</ROW>
</ROWSET>
Here is an example of tryCatch
. You can replace the read.table
with your functions, of course, and it should still work.
This first one will catch any errors and just return the file path for the ones with errors (I created two test files--one which can be read by read.table
and the other will complain)
f <- function(path = "~/desktop/test", ...) {
lf <- list.files(path = path, ...)
l <- lapply(lf, function(x) {
tryCatch(read.table(x, header = TRUE),
error = function(e) x)
})
setNames(l, basename(lf))
}
f(full.names = TRUE)
# $cool_test.txt
# cool test file
# 1 1 2 3
#
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"
tryCatch
is much more powerful and can save you a lot of time
You can grep
the errors and/or warnings for specific text if you want them to be handled differently. Here, for example, I wanted a message if the file I was trying to read doesn't exist. And I want the file path of the ones that exist but cannot be read for some reason.
f2 <- function(path = "~/desktop/test", ..., lf) {
lf <- if (!missing(lf)) lf else list.files(path = path, ...)
l <- lapply(lf, function(x) {
tryCatch(read.table(x, header = TRUE),
warning = function(w) if (grepl('No such file', w)) {
sprintf('%s does not exist', x)
} else sprintf('Some other warning for %s', x),
error = function(e) if (grepl('Error in scan', e)) {
message(sprintf('Check format of %s', x))
x
} else message(sprintf('Some other error for %s', x)))
})
setNames(l, basename(lf))
}
I added a new argument so I can pass a list of file paths instead to show how it handles files that don't exist:
lf <- c("/Users/rawr/desktop/test/cool_test.txt",
"/Users/rawr/desktop/test/notcool_test.txt",
"/Users/rawr/desktop/test/file_does_not_exist.txt")
(out <- f2(lf = lf))
# Check format of /Users/rawr/desktop/test/notcool_test.txt
# $cool_test.txt
# cool test file
# 1 1 2 3
#
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"
#
# $file_does_not_exist.txt
# [1] "/Users/rawr/desktop/test/file_does_not_exist.txt does not exist"
So now you have a list that can contain data frames, file paths, or other messages. You can filter out the data frames and write them in many ways, here are two:
lapply(Filter(is.data.frame, out), function(x) do stuff)
for (ii in out)
if (is.data.frame(ii)) write.csv(ii) else print('not a data frame')
First you are coupling in your program the reading of the xml and the writting. Bad if you want to debug: proceed step by step.
You can use a robustify function to check when something went wrong:
robustify<-function(f,silent=T)
{
is.error <- function(x) inherits(x, "try-error")
function(...)
{
x =try(f(...),silent=silent)
if(is.error(x))
return(NA)
x
}
}
robustParsing = robustify(xmlParse)
library(XML)
lst = lapply(list.files("/My/Initial/Folder"), robustParsing )
Result with NA
in lst
will give you an indication of what file failed.
来源:https://stackoverflow.com/questions/31115045/how-to-handle-errors-while-reading-xml-files-r