Tess4j - Pdf to Tiff to tesseract - “Warning: Invalid resolution 0 dpi. Using 70 instead.”

别等时光非礼了梦想. 提交于 2020-02-06 07:24:07

问题


I am usig tess4j (net.sourceforge.tess4j:tess4j:4.4.0) and try OCR on pdf files. So as I understood I have to transform the pdf first to tiff or png (any of those suggested?) what I did like this:

tesseract.doOCR(PdfUtilities.convertPdf2Tiff(inputPdfFile)); 

and get following warning:

Warning: Invalid resolution 0 dpi. Using 70 instead.

Question

  • Does it has any influence on my scan results? (if not, ok - I can switch off the warning)
  • Is there a way to set the DPI by hand or should convertPdf handle this for me?

回答1:


If no resolution information is in image metadata, Tesseract tries to estimate the resolution by itself so that font size information can be calculated in results.

You can try the following APIs to set input image resolution:

instance.SetTessVariable("user_defined_dpi", "300");

or

TessBaseAPISetSourceResolution(TessBaseAPI handle, int ppi);

You can suppress console output by:

instance.setTessVariable("debug_file", "/dev/null");



来源:https://stackoverflow.com/questions/58286373/tess4j-pdf-to-tiff-to-tesseract-warning-invalid-resolution-0-dpi-using-70

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!