I am trying to extract text from PDF files using Perl. I have been using pdftotext.exe
from command line (i.e using Perl system
function) for extra
James Healy is correct. After trying CAM::PDF and PDF::API2, the former of which I've had some success reading text, downloading pdftotext worked great for a number of my implementations.
If on windows go here and download xpdf precompiled binary: http://www.foolabs.com/xpdf/download.html
Then, if you need to run this within perl use system, e.g.,: system("C:\Utilities\xpdfbin-win-3.04\bin64\pdftotext.exe $saveName");
where $saveName is the full path to your PDF file.
This hopefully leaves you with a text file you can open and parse in perl.