How can I extract text from a PDF file in Perl?

后端 未结 8 1349
花落未央
花落未央 2020-12-03 05:08

I am trying to extract text from PDF files using Perl. I have been using pdftotext.exe from command line (i.e using Perl system function) for extra

8条回答
  •  Happy的楠姐
    2020-12-03 05:35

    I'm not a Perl user but I imagine you'll struggle to find a better free text extractor than pdftotext.

    pdftotext usually recognises non-ASCII characters fine, is it possible it's extracting them ok but the app you're using to view the text file isn't using the correct encoding? If pdftoetxt on windows is the same as the one on my linux system, then it defaults to exporting as utf-8.

提交回复
热议问题