tesseract

Tesseract traineddata not working in Swift 3.0 project using version 4.0

匿名 (未验证) 提交于 2019-12-03 02:47:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm attempting to use Tesseract-OCR-iOS in a new Swift 3.0 project. I'm using Xcode Version 8.1 (8B62). CocoaPods is version 1.1.1. When I attempt to use tesseract.recognize() , my app crashes and I get the following output in the console: actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53 I found this post , which sounds I'm using the wrong version of traineddata . I downloaded tessdata from the tesseract-ocr/tessdata repo , so I'm baffled as to why I'd have a mismatch on the

Tesseract training for a new font

时光毁灭记忆、已成空白 提交于 2019-12-03 02:46:09
I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which supposedly would be able to decrease error rate for a specific font you'd use. I came across a website ( http://ocr7.com/ ) which is a tool powered by Anyline to do all the training for a font you specify. So I recieved a .traineddata file and I am not quite sure what to do with it. Could anybody explain what I have to do with this file for it to work? Or should I just learn how to do Tesseract

Tesseract - ambiguity in space and tab

匿名 (未验证) 提交于 2019-12-03 02:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I had a tiff file, which contain some text separated by tabs (4 spaces). But when I extract text out of this tiff image file, i always get a single space between two columns. A sample example: TIFF IMAGE: col-a col-b col-c desired output: col-a col-b col-c but I am getting the following: col-a col-b col-c I tried this with multiple images of same format, but the result is always the same. How do I fix this issue ? Can I train tesseract to understand this? 回答1: Tesseract compresses consecutive spaces into one. You would need to modify baseapi

File tesseract.exe does not exist

匿名 (未验证) 提交于 2019-12-03 02:36:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have installed the pytesseract library using pip install pytesseract When I tried to use the image_to_text method, it gave me a FileNotFoundError: [WinError 2] The system can not find the file specified I googled it and found that I should change something in the pytesseract.py file and the line tesseract_cmd = 'tesseract' should become tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract' I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't

UnicodeDecodeError with Tesseract OCR in Python

匿名 (未验证) 提交于 2019-12-03 02:30:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Iam trying to extract text from an image file using Tesseract OCR in Python but I'am facing an Error that i can figure out how to deal with it. all my environment is good as i tested some sample image with the ocr in python! here is the code from PIL import Image import pytesseract strs = pytesseract.image_to_string(Image.open('binarized_image.png')) print (strs) the follow is the error I get from eclipse console strs = pytesseract.image_to_string(Image.open('binarized_body.png')) File "C:\Python35x64\lib\site-packages\pytesseract

Tesseract 3 (OCR) - .NET Wrapper

匿名 (未验证) 提交于 2019-12-03 02:13:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: http://code.google.com/p/tesseractdotnet/ I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to have it working in VS2010 : http://code.google.com/p/tesseractdotnet/issues/detail?id=1 I am adding a reference to the dll which can be found in the attached to post 64 from the website above. Every time I build my project I get an AccessViolationException saying that an attempt was made to

API to read text from Image file using OCR

匿名 (未验证) 提交于 2019-12-03 02:05:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am looking out for an example code or API name from OCR (Optical character recognition) in Java using which I can extract all text present from an image file. Without comparing it with any image which I am doing using below code. public class OCRTest { static String STR = ""; public static void main(String[] args) { OCR l = new OCR(0.70f); l.loadFontsDirectory(OCRTest.class, new File("fonts")); l.loadFont(OCRTest.class, new File("fonts", "font_1")); ImageBinaryGrey i = new ImageBinaryGrey(Capture.load(OCRTest.class, "full.png")); STR = l

Using tesseract to recognize license plates

匿名 (未验证) 提交于 2019-12-03 01:58:03
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well: But as you can see, the OCR results are pretty bad. I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine: // init the tesseract engine. tesseract = new tesseract::TessBaseAPI(); int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding],

Alternative to Tesseract OCR Training?

℡╲_俬逩灬. 提交于 2019-12-03 01:51:07
For the past 3 months I've been trying to train the Tesseract With identifying a collection of images I've had, due a real lack of proper documentation, and very high level of complexity I'm starting to give up on Tesseract as a solution. I'm looking for an alternative, which would be relatively pain free for training, I'm not looking to rediscover the wheel here. If there isn't anything free, I guess paid solutions would have to do (nothing above 200$) Tomato Based on your comment, all you need is to scan relatively small amount of documents with almost 100% accuracy and your budget is about

Including *.so libraries Android Studio tess-two (tesseract)

匿名 (未验证) 提交于 2019-12-03 01:46:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have been trying to include Tesseract libraries into my Android project today. From what I have found I did following: 1) Download tess-two from gooogle git, 2) build with NDK 3) put *.so files (armeabi/v7,x86,mips) into /app/main/jniLibs/, 4) pack *.so into .jar file, put archive into app/libs/ and wrote dependency {} to it in gradle.build file I'm using Android Studio and when I write TessBaseAPI and hit Alt+Enter -> "add dependency to tess-two module" it automaticaly write import row: import com.googlecode.tesseract.android.TessBaseAPI;