ocr

OCR: weighted Levenshtein distance

ⅰ亾dé卋堺 提交于 2019-12-21 02:33:09
问题 I'm trying to create an optical character recognition system with the dictionary. In fact I don't have an implemented dictionary yet=) I've heard that there are simple metrics based on Levenstein distance which take in account different distance between different symbols. E.g. 'N' and 'H' are very close to each other and d("THEATRE", "TNEATRE") should be less than d("THEATRE", "TOEATRE") which is impossible using basic Levenstein distance. Could you help me locating such metric, please. 回答1:

Windows 7 OCR API

孤街浪徒 提交于 2019-12-20 17:47:40
问题 I have been reviewing replacements for the Office 2007 MODI OCR (OneNote's 2010 solution has lesser quality/results than 2007 :-( ). I notice that Windows 7 contains an OCR library once you install the optional tiff filter The OCR component gets installed to %programfiles%\Common Files\microsoft shared\OCR\7.0\xocr3.psp.dll but I don't see any API for it? Does anyone see how this can be interfaced preferably in C#? ANSWER: Found the soluation, once the optional tiff ifilter win7 feature is

How to OCR engraved text?

僤鯓⒐⒋嵵緔 提交于 2019-12-20 12:41:03
问题 I have this image How to OCR it? I know this is very challenging, but I would really appreciate any help. 回答1: If you have the time to develop the detection yourself, I would do it roughly like this: Get 1000 images or so and either OCR them yourself or let the people on Amazon Mechanical Turk do it for you, it will cost virtually nothing. Now you have something to tune your algorithm on and measure how well you are doing. Like Ryan wrote, play with standard image filters, contrast, color,

How do I train tesseract 4 with image data instead of a font file?

被刻印的时光 ゝ 提交于 2019-12-20 10:46:20
问题 I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff files to train with LSTM in Tesseract 4. I looked into tesstrain.sh, which is used to generate LSTM training data but couldn't find anything helpful. Any ideas? 来源: https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data

Tesseract training for a new font

女生的网名这么多〃 提交于 2019-12-20 09:56:19
问题 I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which supposedly would be able to decrease error rate for a specific font you'd use. I came across a website (http://ocr7.com/) which is a tool powered by Anyline to do all the training for a font you specify. So I recieved a .traineddata file and I am not quite sure what to do with it. Could anybody

“Adding” new fonts to Tesseract eng.traineddata

蹲街弑〆低调 提交于 2019-12-20 09:06:19
问题 As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols). I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02 For generating box files I used QT Box Editor After using above tools I get eng.traineddata file. All

“Adding” new fonts to Tesseract eng.traineddata

懵懂的女人 提交于 2019-12-20 09:06:11
问题 As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols). I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02 For generating box files I used QT Box Editor After using above tools I get eng.traineddata file. All

What OCR options exist beyond Tesseract? [closed]

China☆狼群 提交于 2019-12-20 08:35:57
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border,

Character recognition (OCR algorithm) [closed]

自作多情 提交于 2019-12-20 08:17:07
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image. Steps to complete first task. Loading any image format (bmp, jpg, png) from given source.

Tess-two OCR not working

大兔子大兔子 提交于 2019-12-20 04:48:25
问题 im trying to get text from an image using tess-two on android. But its giving me a really bad result 01-16 12:00:25.339: I/Tesseract(native)(29038): Initialized Tesseract API with language=spa and like 30 seconds later it shows this as result string: {ga ., r¿ y“: A r M í :3 ' ‘Ev’.-:.. -: A 7 » w- ?" _ Á.» ¿"A ¿rw-V r mjÏfn 'n’n . Y ' "\'ZA".‘.¡ A‘ :‘ïvAv- « ‘ :"Éf‘Ï'" -Ï«l :‘,.v:...»- . ' RFI' .. ’ g)" 3;:- 1-;4', = * ¿,arifgggk mw; .1. , ' "53» "J 't‘ ‘ ¿Las ;.‘».L',-‘» ' ' 'N‘“ "“=: - '.