Scanned Image/PDF to Searchable Image/PDF

拜拜、爱过 提交于 2019-12-23 02:32:23

问题


Can anyone suggest me how to convert a scanned image into a searchable image or a scanned pdf to a searchable pdf ?
I have been stuck in this situation since quite a while now.
i have tried pdfocr application in ubuntu but no success.


回答1:


Tesseract version 3.03 supports creation of searchable PDF from image. For PDF, you can use GhostScript to convert it to image before sending it to Tesseract.

https://github.com/tesseract-ocr/tesseract




回答2:


Currently, there is no right way of doing this on Ubuntu. All OCR engines output plain text and there is no way to add that text as a hidden layer on PDF over the image text.

Option 1: Use gscan2pdf which will make you a searchable PDF, but the OCRed text is placed in the top-left corner of the page, is invisible and much too small.

Option 2: Use PDF X-Change Viewer which has an option to OCR and works correctly by adding a text layer over the scanned image which is in concordance with it. You'll have to run it in wine, because it is a Windows application.



来源:https://stackoverflow.com/questions/24848808/scanned-image-pdf-to-searchable-image-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!