Best setting for scanners for scanning documents(TIFF and PDF) [closed]

风流意气都作罢 提交于 2019-12-01 13:02:57

问题


What are the best settings for scanner in order to scan documents(white & black text) and use them for OCR conversion(for best results) and what are standard settings and specification for PDF and TIFF format ?


回答1:


For OCR, best scanning settings are:

  • 300 dpi resolution for regular text, 400 dpi resolution for particularly small fonts (fine print)
  • Black & white for text, greyscale for small fonts, color for pictures
  • TIFF format. Group4 is used for black & white (very small file size). If color is needed, use Uncompressed (very large file size).

Some OCR technologies may have special preferences, which may slightly help, but they are usually minor.




回答2:


For OCR purpose, I would scan a document at 300DPI, B/W or grayscale, and uncompressed TIFF or PNG format.




回答3:


While 300DPI is optimal for "perfect" inputs, if you are working with imperfect inputs (e.g. from a typewriter or dot-matrix printer), then the high resolution will actually throw tesseract off. In cases like this, it is better to use a lower resolution to sort of hide the imperfections. E.g. with a dot-matrix printer I get significantly better results at 150dpi than 300dpi.




回答4:


If you want a general answer, 300 DPI is good. The best OCR results usually for B/W images and if your image quality is low, you might improve it by applying image processing.

Also, if you are saving the scanned image then feeding it to the OCR engine, do NOT use lossy compression like JPEG. Note that there is a lossless JPEG compression but it is not commonly supported.



来源:https://stackoverflow.com/questions/18620977/best-setting-for-scanners-for-scanning-documentstiff-and-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!