tesseract

Tesseract and tiff format - spp not in set {1,3}

瘦欲@ 提交于 2019-12-20 15:57:44
问题 While trying to run this command: tesseract bond111.tif bond111 batch.nochop makebox I get the next error Error in pixReadFromTiffStream: spp not in set {1,3} Error in pixReadStreamTiff: pix not read Error in pixReadTiff: pix not read Assuming that spp not in set is the main error here, what does it mean? At first it had trouble because the bpp was higher than 24 so I reduced it using Gimp but that did not resolve the issue. 回答1: It probably means your TIFF image has an alpha channel and

How to extract relevant information from receipt

牧云@^-^@ 提交于 2019-12-20 14:21:56
问题 I am trying to extract information from a range of different receipts using a combination of Opencv, Tesseract and Keras. The end result of the project is that I should be able to take a picture of a receipt using a phone and from that picture get the store name, payment type (card or cash), amount paid and change tendered. So far I have done a few different preprocessing steps on a series of different sample receipts using Opencv such as removing background, denoising and converting to a

How do I train tesseract 4 with image data instead of a font file?

被刻印的时光 ゝ 提交于 2019-12-20 10:46:20
问题 I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff files to train with LSTM in Tesseract 4. I looked into tesstrain.sh, which is used to generate LSTM training data but couldn't find anything helpful. Any ideas? 来源: https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data

Tesseract training for a new font

女生的网名这么多〃 提交于 2019-12-20 09:56:19
问题 I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which supposedly would be able to decrease error rate for a specific font you'd use. I came across a website (http://ocr7.com/) which is a tool powered by Anyline to do all the training for a font you specify. So I recieved a .traineddata file and I am not quite sure what to do with it. Could anybody

Where I can find the list of available property name for tesseract->setvariable function's first parameter?

南笙酒味 提交于 2019-12-20 09:43:38
问题 From the lots of goggling I am able to find only few of them as the below example for tesseract's setVariable(1st param, 2nd param) tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"); tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "0"); tesseract->SetVariable("language_model_penalty_non_dict_word", "0"); tesseract->SetVariable("tessedit_char_blacklist", "xyz"); tesseract->SetVariable("classify_bln_numeric_mode", "1"); I would like to

“Adding” new fonts to Tesseract eng.traineddata

蹲街弑〆低调 提交于 2019-12-20 09:06:19
问题 As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols). I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02 For generating box files I used QT Box Editor After using above tools I get eng.traineddata file. All

“Adding” new fonts to Tesseract eng.traineddata

懵懂的女人 提交于 2019-12-20 09:06:11
问题 As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols). I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02 For generating box files I used QT Box Editor After using above tools I get eng.traineddata file. All

What OCR options exist beyond Tesseract? [closed]

China☆狼群 提交于 2019-12-20 08:35:57
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border,

Removing extra pixels/lines from license plate

时光毁灭记忆、已成空白 提交于 2019-12-20 05:14:23
问题 I am using HOG feature detector based on SVM classification. I can successfully extract license plate, but the extracted number plate have some unnecessary pixels/lines apart from license number. My image processing pipeline is as follows: Applying HOG detector on the grayscale image Cropping detected region Re-sizing the cropped image Applying adaptive threshold to highlight the plate numbers & filtering background using following Opencv code cvAdaptiveThreshold(cropped_plate, thresholded

Tess-two OCR not working

大兔子大兔子 提交于 2019-12-20 04:48:25
问题 im trying to get text from an image using tess-two on android. But its giving me a really bad result 01-16 12:00:25.339: I/Tesseract(native)(29038): Initialized Tesseract API with language=spa and like 30 seconds later it shows this as result string: {ga ., r¿ y“: A r M í :3 ' ‘Ev’.-:.. -: A 7 » w- ?" _ Á.» ¿"A ¿rw-V r mjÏfn 'n’n . Y ' "\'ZA".‘.¡ A‘ :‘ïvAv- « ‘ :"Éf‘Ï'" -Ï«l :‘,.v:...»- . ' RFI' .. ’ g)" 3;:- 1-;4', = * ¿,arifgggk mw; .1. , ' "53» "J 't‘ ‘ ¿Las ;.‘».L',-‘» ' ' 'N‘“ "“=: - '.