What is the best perl module to extract text from a pdf? [closed]

↘锁芯ラ 提交于 2019-12-06 02:51:00

问题


What is the best way to extract text from a pdf?


回答1:


The CAM::PDF module is pretty useful for extracting text and maintaining some information about where it came from in the document. It installs /usr/local/bin/getpdftext.pl which demonstrates simple extraction. However, CAM::PDF can only read PDFs that are completely valid.

If you are dealing with ill-formed PDFs, you may need a more lenient parser, such as pdftotext. It dumps foo.pdf to foo.txt, which you could then read into Perl.



来源:https://stackoverflow.com/questions/4730651/what-is-the-best-perl-module-to-extract-text-from-a-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!