发表新帖

发表新帖

Ruby: Reading PDF files

前端未结

关注

 6  662

孤独总比滥情好 2020-12-02 06:05

I\'m looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX).

Until now I\'ve found the rather old and simple PDF-toolkit (a pd

6条回答

误落风尘 (楼主)

2020-12-02 06:49

Here's some options:

http://en.wikipedia.org/wiki/List_of_PDF_software

From that link, and searching sourceforge, there's a couple of command line utilities that might do what you want, like this one: http://pdftohtml.sourceforge.net/

Depending on your requirements and what the PDFs look like, you could look at using the Google Docs API (uploading the PDF and then downloading it as text), or could also try something like gocr. I've had a lot of luck parsing image text with gocr in the past, and you'd just have to bounce out to the shell to do it, like gocr -i whatever.pdf (I think it works with PDFs).

The downside to all of these is that they're not pure-Ruby implementations, but lots of the good (and free) OCR projects seem to be done that way.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题