how to handle errors while reading xml files R

别来无恙 提交于 2019-12-06 12:04:52

Here is an example of tryCatch. You can replace the read.table with your functions, of course, and it should still work.

This first one will catch any errors and just return the file path for the ones with errors (I created two test files--one which can be read by read.table and the other will complain)

f <- function(path = "~/desktop/test", ...) {
  lf <- list.files(path = path, ...)
  l <- lapply(lf, function(x) {
    tryCatch(read.table(x, header = TRUE),
             error = function(e) x)
  })
  setNames(l, basename(lf))
}

f(full.names = TRUE)

# $cool_test.txt
#   cool test file
# 1    1    2    3
# 
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"

tryCatch is much more powerful and can save you a lot of time

You can grep the errors and/or warnings for specific text if you want them to be handled differently. Here, for example, I wanted a message if the file I was trying to read doesn't exist. And I want the file path of the ones that exist but cannot be read for some reason.

f2 <- function(path = "~/desktop/test", ..., lf) {
  lf <- if (!missing(lf)) lf else list.files(path = path, ...)
  l <- lapply(lf, function(x) {
    tryCatch(read.table(x, header = TRUE),
             warning = function(w) if (grepl('No such file', w)) {
               sprintf('%s does not exist', x)
             } else sprintf('Some other warning for %s', x),
             error = function(e) if (grepl('Error in scan', e)) {
               message(sprintf('Check format of %s', x))
               x
              } else message(sprintf('Some other error for %s', x)))
  })
  setNames(l, basename(lf))
}

I added a new argument so I can pass a list of file paths instead to show how it handles files that don't exist:

lf <- c("/Users/rawr/desktop/test/cool_test.txt",
        "/Users/rawr/desktop/test/notcool_test.txt",
        "/Users/rawr/desktop/test/file_does_not_exist.txt")

(out <- f2(lf = lf))

# Check format of /Users/rawr/desktop/test/notcool_test.txt
# $cool_test.txt
#   cool test file
# 1    1    2    3
# 
# $notcool_test.txt
# [1] "/Users/rawr/desktop/test/notcool_test.txt"
# 
# $file_does_not_exist.txt
# [1] "/Users/rawr/desktop/test/file_does_not_exist.txt does not exist"

So now you have a list that can contain data frames, file paths, or other messages. You can filter out the data frames and write them in many ways, here are two:

lapply(Filter(is.data.frame, out), function(x) do stuff)

for (ii in out)
  if (is.data.frame(ii)) write.csv(ii) else print('not a data frame')

First you are coupling in your program the reading of the xml and the writting. Bad if you want to debug: proceed step by step.

You can use a robustify function to check when something went wrong:

robustify<-function(f,silent=T)
{
    is.error <- function(x) inherits(x, "try-error")
    function(...)
    {
        x =try(f(...),silent=silent)
        if(is.error(x))
            return(NA)
        x
    }
}

robustParsing = robustify(xmlParse)

library(XML)

lst = lapply(list.files("/My/Initial/Folder"), robustParsing )

Result with NA in lst will give you an indication of what file failed.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!