Software to Improve OCR Results Based on Output from Multiple OCR Software Packages

旧巷老猫 提交于 2019-12-19 11:38:20

问题


Is there an already-existing piece of commercial or academic software that can

  • overlay results from multiple OCR packages (Abbyy FineReader, Adobe Acrobat Professional, ReadIris, etc.)
  • provide fully automated improvements based on accumulated knowledge from multiple sources
  • allow for use of additional external tools setup at runtime (dictionieres, batch web / local corpus look-ups etc.)

?

Note: I already have in-house solutions to visualize results from single sources, so in case there is no such software obtainable, I would not mind developing my own : ) Inquiries for cooperation would then also be most welcome!
(source: sourceforge.net)


回答1:


The idea to use voting between several OCR engines is not new. The thing is that it is not really working. What probably would work if they would be simple classifiers ortogonal by thier nature, then you would combine their votes and improve results. But they all are very complicated software, using quite similar set of well-known approches with little variances, but probably combining them different way and some implementations are better and some are worse.

Experience shows that when you combine several OCR technologies, the best decision rule is to rely on results of most accurate one and just ingore others. From my experience (I work for ABBYY), ABBYY OCR is definetely the most accurate from ones you mentioned.

As far as I know, the only reason to use voting is when you want cross-check "suspicious" characters and send them to manual verification if 100% accuracy is a requirement. Using this approach you increase number of characters to verify, but reduce possibility to miss wrong character.




回答2:


There are two options that I have worked with previously and would recommend.

  1. PrimeOCR. http://www.primerecognition.com/

It is a commercial offering that uses multiple OCR engines and voting to determine the best result. It is machine print only. Last time I used it they had 6 engines. Contact Alex Dahl.

I have used it in a major project scanning 20,000+ pages per day.

  1. RecoStar from OpenText.

RecoStar uses voting and can do handprint and machineprint.



来源:https://stackoverflow.com/questions/3271174/software-to-improve-ocr-results-based-on-output-from-multiple-ocr-software-packa

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!