Separation of background/foreground layers in a scanned document

南楼画角 提交于 2020-01-05 09:32:35

问题


I need to automatically remove the mildly colored background of a scanned document image for OCR.

ScanTailor is an open source C++ GUI-based app that does background separation among other things, but I cannot figure out how to run only the last step which actually removes the background.

Ideally, I could find the code that does this and either:

  1. Port that part to C#
  2. Modify the C++ to respond to command line execution, only performing that step on a given image

Can you help me understand how I can do either?
or do you know other libraries that can do this? (any language/platform acceptable)


回答1:


You are referring to Thresholding, Despeckling and Noise Removal techniques which are necessary in OCR applications.

The quality of the results depends very much an many different factors -

Print quality of the original Scan quality Image resolution Background colours and patterns used. Noise and other marks.

You may find the IEvolution.NET library at http://www.hi-components.com/nievolution.asp useful. It has many image processing functions to play with.

There are many commercial engines available. There is no one perfect function to solve image processing problems. You must adapt the functions and parameter to match your images. http://www.recogniform.com/thresholding.htm

  • Best threshold for converting grayscale to black and white
  • Adaptive threshold binarization: post-processing for removing ghost objects
  • Adaptive threshold Binarization's bad effects
  • fast threshold and bit packing algorithm ( possible improvements ? )

A Google search will show up lots of results.




回答2:


Maybe the algorithm is, approximately:

  • Decide what the background color is
  • Scan the bitmap for pixels whose color is (and/or is sufficiently similar to) the background color
  • Convert these pixels to white or transparent
  • Possibly (especially if the page contains images and not just text) ignore isolated pixels, which are the background color but are not next to other also-background pixels

If it's a high-resolution low-color-depth (e.g. black-and-white) image, then you need to apply this algorithm to groups of pixels.



来源:https://stackoverflow.com/questions/4327172/separation-of-background-foreground-layers-in-a-scanned-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!