tesseract

Multiple subprocesses take a lot of time to complete

纵饮孤独 提交于 2020-01-06 08:16:00
问题 I have a single process that is run using subprocess module's Popen : result = subprocess.Popen(['tesseract','mypic.png','myop']) st = time() while result.poll() is None: sleep(0.001) en = time() print('Took :'+str(en-st)) Which results in: Took :0.44703030586242676 Here, a tesseract call is made to process an image mypic.png (attached) and output the OCR's result to myop.txt . Now I want this to happen on multiple processes on behalf of this comment (or see this directly), so the code is

How to processing the image for tesserac in java?

和自甴很熟 提交于 2020-01-06 08:07:50
问题 I am trying to read characters from a image below using Tesseract: And here is my coding for reading the image. Tesseract tesseract = new Tesseract(); try { String text = tesseract.doOCR(new File(path)); // path of your image file System.out.println(text); } catch (TesseractException e) { e.printStackTrace(); } I failed to get the accurate text from the image. So how can i processing the image before reading? 回答1: tesseract is not suitable for captcha breaking. 来源: https://stackoverflow.com

OCR for android application tess4j

て烟熏妆下的殇ゞ 提交于 2020-01-06 06:53:22
问题 Basically am designing an application that will capture an image from the android devices default camera and will display that image in an image view! works fine! good enough! capt_but.setOnClickListener(new View.OnClickListener() { //@Override // TODO Auto-generated method stub public void onClick(View v) { Intent cameraIntent = new Intent(android.provider.MediaStore.ACTION_IMAGE_CAPTURE); startActivityForResult(cameraIntent, CAMERA_REQUEST); } }); } protected void onActivityResult(int

Training tesseract - shapeclustering issue

荒凉一梦 提交于 2020-01-05 08:17:16
问题 I'm trying to train tesseract (adding a new, digit only font) as per the instructions found here: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 What I've done: Created a PDF with sample text, converted to tif, ran tesseract num.dot.exp0.tif num.dot.exp0 batch.nochop makebox digits . Then edited the generated box file, correcting wrong detections Ran tesseract on training mode: tesseract num.dot.exp0.tif num.dot.exp0 nobatch box.train and extracted the unicharset with

How can I identify the color of the letters in these images?

倖福魔咒の 提交于 2020-01-04 02:41:14
问题 I am using this article to solve captchas. It works by removing the background from the image using AForge, and then applying Tesseract OCR to the resulting cleaned image. The problem is, it currently relies on the letters being black, and since each captcha has a different text color, I need to either pass the color to the image cleaner, or change the color of the letters to black. To do either one, I need to know what the existing color of the letters is. How might I go about identifying

Tess4J Mac: NoClassDefFoundError

爷,独闯天下 提交于 2020-01-03 20:59:26
问题 I'm trying to use Tess4J in my project. It doesn't include .dylib files for Mac, so I've built my own Tesseract and am using the .dylib from the Tesseract build. I'm able to load the native library with no issue, and I believe have the Tess4J library linked properly, since I can import it with no issue. However, when I try to create a new instance of Tesseract using: Tesseract t = new Tesseract(); I'm getting the following error: Exception in thread "main" java.lang.NoClassDefFoundError: com

Link tesseract libs with QtCreator

给你一囗甜甜゛ 提交于 2020-01-03 17:08:20
问题 I'm trying to run a C++ program which is based on tesseract API and I'm using QtCreator as IDE on Ubuntu, in order to perfom page layout analysis : int main(void) { int left, top, right, bottom; tesseract::TessBaseAPI tessApi; tessApi.InitForAnalysePage(); cv::Mat img = cv::imread("document.png"); tessApi.SetImage(reinterpret_cast<const uchar*>(img.data), img.size().width, img.size().height, img.channels(), img.step1()); tesseract::PageIterator *iter = tessApi.AnalyseLayout(); while (iter-

Can't Compile Tesseract API example for WIndows using Tesseract 3.0.2.02 archive

二次信任 提交于 2020-01-03 15:59:01
问题 I'm looking at using Tesseract to do some work with PDF files, and so I want to use the library rather than an external executable. I started by downloading the full Tesseract source and looking at building that. Sadly the standard sources don't have any means to build on a non-Linux platform, in my case Windows. There are methods for doing so, and I looked at those. Firstly the VS2008 build doesn't. I'm aware that it need Leptonica, but I figured I'd tackle that afterwards and just tried to

Strange Error When Using Tesseract in VB.net

耗尽温柔 提交于 2020-01-03 05:04:30
问题 I have the current code: Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim Bitmap As New Bitmap("image.png") Dim ocr As tessnet2.Tesseract = New tessnet2.Tesseract() ocr.SetVariable("tessedit_char_whitelit", "0123456789") ocr.Init("c:\", "fra", False) Dim result As List(Of tessnet2.Word) = ocr.DoOCR(Bitmap, Rectangle.Empty) For Each word As tessnet2.Word In result RichTextBox1.Text &= word.Text & "(" & word.Confidence & ") " Next

How to reduce size of tessdata used for TessBaseAPI in android?

不打扰是莪最后的温柔 提交于 2020-01-03 02:48:32
问题 I have an android application where I am using Tesseract OCR i.e the TessBaseAPI. This requires tessdata which is 21mb file. My final app release APK comes to approx 19 mb which I find quite a lot. Is there any way by which I can reduce the size of tessdata or my app or anything else which will help me reduce the final APK size? 回答1: You can use the 3.01 version of .trainddata files -- they have much smaller size -- which are still compatible with newer versions of Tesseract. 来源: https:/