How to transfer OCR text from one PDF to another PDF?

落花浮王杯 提交于 2020-04-30 09:07:46

问题


I have two versions of one same scanned PDF. One of them has an OCR layer. How can I transfer the layer to the other one? I already install Ghostscript, but I don't know what to do next.

How to Use Ghostscript


回答1:


There's no such thing as an 'OCR layer' in PDF.

Most likely what you have is a PDF file which has a scanned image and the text extracted from that image using OCR which has been drawn as 'invisible' text (text rendering mode 3).

In general you can't copy and paste text between PDF files, so its very hard to do what you are asking. I don't know of any tools which will help you here, I can say for certain that Ghostscript absolutely will not help you at all.

Most likely you will also need to copy the Font (or CIDFont) from the PDF file as well, and if it has a ToUnicode CMap you'll definitely also want that or search won't work (and there's little point in this sort of OCR otherwise).

Since you have a PDF file which includes the OCR'ed text, why not simply use that PDF ? I can't see any reason why you would want to 'transfer' it to another PDF file.



来源:https://stackoverflow.com/questions/61061606/how-to-transfer-ocr-text-from-one-pdf-to-another-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!