How can I extract text from a PDF file in Perl?

后端 未结 8 1332
花落未央
花落未央 2020-12-03 05:08

I am trying to extract text from PDF files using Perl. I have been using pdftotext.exe from command line (i.e using Perl system function) for extra

8条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-03 05:50

    James Healy is correct. After trying CAM::PDF and PDF::API2, the former of which I've had some success reading text, downloading pdftotext worked great for a number of my implementations.

    If on windows go here and download xpdf precompiled binary: http://www.foolabs.com/xpdf/download.html

    Then, if you need to run this within perl use system, e.g.,: system("C:\Utilities\xpdfbin-win-3.04\bin64\pdftotext.exe $saveName");

    where $saveName is the full path to your PDF file.

    This hopefully leaves you with a text file you can open and parse in perl.

提交回复
热议问题