发表新帖

发表新帖

PDF text extraction from given coordinates

前端未结

关注

 3  1223

情话喂你 2020-11-27 10:43

I would like to extract text from a portion (using coordinates) of PDF using Ghostscript.

Can anyone help me out?

3条回答

执笔经年 (楼主)

2020-11-27 11:20

I'm not sure GhostScript can accept coordinates, but you can convert the PDF to a image and send it to an OCR engine either as a subimage cropped from the given coordinates or as the whole image along with the coordinates. Some OCR API accepts a rectangle parameter to narrow the region for OCR.

Look at VietOCR for a working example, which uses Tesseract as its OCR engine and GhostScript as PDF-to-image converter.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题