Is it possible to parse text data from PDF files in R? There does not appear to be a relevant package for such extraction, but has anyone attempted or seen this done in R?>
A purely R solution could be:
library('tm') file <- 'namefile.pdf' Rpdf <- readPDF(control = list(text = "-layout")) corpus <- VCorpus(URISource(file), readerControl = list(reader = Rpdf)) corpus.array <- content(content(corpus)[[1]])
then you'll have pdf lines in an array.