tesseract

why tesseract 3.0 does not recognize text inside box/rectangles/squares? [duplicate]

不打扰是莪最后的温柔 提交于 2019-12-12 02:09:13
问题 This question already has answers here : why tesseract fails for this image? (2 answers) Closed 5 years ago . I'have tried tesseract on this image and some scanned images with some text inside rectangles. but it fails each time with "empty image" as output. please suggest me how can i solve this problem as I'm working on form prosessing. plz do help 回答1: What you could do would be to draw a section of the image you are getting, where the section would be the inner part of the border. You

OCR Tesseract - Tess4J behaving weirdly

风流意气都作罢 提交于 2019-12-12 01:55:42
问题 I am trying to extract text out of an image. The issue is that I am using the below given code to process the image and print the extracted text. public class Test { public static void extractText(String filename) // public static void main(String[] args) { System.setProperty("jna.library.path", "32".equals(System.getProperty("sun.arch.data.model")) ? "lib/win32-x86" : "lib/win32-x86-64"); File imageFile = new File("img_perspective.png"); Tesseract instance = Tesseract.getInstance(); // JNA

Intercepting console output which originated from Tess4J

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-11 23:44:31
问题 I am trying to intercept the red Empty page!! message that gets printed to my screen when using Tess4J . I wrote a short interceptor class that overrides print and println and replaced stdout and stderr to check for this string: private static class Interceptor extends PrintStream { public Interceptor(OutputStream out) { super(out, true); } @Override public void print(String s) { if ( !s.contains("Empty page!!") ) super.print(s); } @Override public void println(String s) { if ( !s.contains(

Merge trained data files - Tesseract

那年仲夏 提交于 2019-12-11 19:40:12
问题 I'm using two traineddata files in tesseract in order to recognize two languages. But because the accuracy wasn't good enough, I trained tesseract and produce a new traineddata file which I want to merge it with one of the two language files I use. So my question is: How can it be possible to merge the new traineddata file with one of the files that is found here: https://code.google.com/p/tesseract-ocr/downloads/list .Any help? 回答1: You can unpack the existing .traineddata and merge the

finding answer to “which of these” questions

雨燕双飞 提交于 2019-12-11 19:35:41
问题 I am writing a Python program for a quiz answer-bot (for educational purposes only) using Tesseract OCR and the google-search-Api . The program seems to be very accurate when dealing with direct question ("who did what", "what is this") but has some problems with questions which include the answers as a part of themselves ("which of these"). import pytesseract from PIL import Image from googleapiclient.discovery import build import json import unicodedata import time import os #removing non

Cannot find a way to make tessnet2 work

流过昼夜 提交于 2019-12-11 18:33:23
问题 I have created a console application. Added a reference to tessnet2_32. Ocr ocr = new Ocr(); using (Bitmap bmp = new Bitmap(filename)) { tessnet2.Tesseract tessocr = new tessnet2.Tesseract(); tessocr.Init(@"C:\temp\tessdata", "eng", false); ... I also tried changing "C:\temp\tessdata" to C:\work\ConsoleApplication3\ConsoleApplication3 C:\work\ConsoleApplication3\ConsoleApplication3\tessdata C:\work\ConsoleApplication3\ConsoleApplication3\bin\debug C:\work\ConsoleApplication3

Reading text from image using Tesseract and OpenCV (Java)

独自空忆成欢 提交于 2019-12-11 17:25:59
问题 I'm trying to make a program that can read the information off of a nutritional label but Tesseract is having lots of issues actually being able to read anything. I've tried a number of different Image processing techniques using OpenCV but not much seems to help. Here are some of my better looking attempts (which happen to be the simplest): Tango bottle label uneditied Tango bottle label edited Output: 200k], Saturates, 09 Irn Bru bottle label unedited Irn Bru bottle label edited Output This

Pytesseract dont reconize a very clear image

余生颓废 提交于 2019-12-11 17:23:00
问题 I have aplied pytesseract in Three similar images of the digit "2". Only in the last one, pytesseract reconize correctly the digit. The three images have diferent dimensions and if i change the dimension of the images in the right way, pytesseract correctly reconize them. But i dont understand how a powerful ocr like tesseract is not working well in a so easy and clear image. first image, fail in recognize second image, also fail third image, sucessful im using python 3.7 with anaconda,

Tesseract not using path variable

流过昼夜 提交于 2019-12-11 16:06:29
问题 Why does my Tesseract instance require me to explicitly set my datapath, but doesn't want to read the environment variable? Let me clarify: running the code ITesseract tesseract = new Tesseract(); String result = tesseract.doOCR(myImage); Throws an error: Error opening data file ./tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. I already have set my environment variable, ie doing echo $TESSDATA

Can not find Tesseract 4.0 tessdata only for Numbers

▼魔方 西西 提交于 2019-12-11 15:58:18
问题 As in this post: pytesseract using tesseract 4.0 numbers only not working Described, its possible to detect numbers with the eng.traineddata file but if I want to detect only numbers, this isn't possible with this file. Even if you define tessedit_char_whitelist=0123456789 it doesn't recognize anything. I searched on GitHub and so on to find a digit.traineddata for Tesseract 4.0 but didn't found one? Does someone know which one I could take? Is it possible to use one from Tesseract 3.x (but