tesseract

[python]has no attribute 'TessBaseAPI'

烂漫一生 提交于 2019-12-07 04:09:58
问题 I got an error when I compiled code blow: import tessercat api = tesseract.TessBaseAPI() The error is: AttributeError:'module' object has no attribute 'TessBaseAPI' I have already installed tesseract via pip . The Python version is 2.7 Windows 32bit. 回答1: I think you are trying to use python wrapper of tesseract (python-tesseract). Make sure you are using the right version. You can check this one: python-tesseract-0.7.6.win32-py2.7.exe 回答2: Make sure that you don't need to import a sublcass

Remove receipt image border using ImageMagick

折月煮酒 提交于 2019-12-07 04:07:17
问题 I'm using ImageMagick service to pre-process the receipt image before using tesseract-OCR engine to extract texts. I need to remove the background of the receipts. I've gone through masking to remove the border here. But I'm unable to create the mask for the receipts. However, I've tried to remove shadows from the receipt images. Initial Image (Example receipt) convert input.png -colorspace gray \ \( +clone -blur 0x2 \) +swap -compose divide -composite \ -linear-stretch 5%x0% photocopy.png

Tesseract error. Illegal min or max specification

这一生的挚爱 提交于 2019-12-07 04:05:32
问题 Trying to run sample code from here http://tess4j.sourceforge.net/codesample.html I got an error saying Error: Illegal min or max specification! signal_termination_handler:Error:Signal_termination_handler called:Code 5002 I found solution e.g. here https://code.google.com/p/tesseract-ocr/issues/detail?id=228 ppl say that setting locale is enough to get rid of error. My problem is that I write it in Java not C++ and I cannot find anywhere how I can set locale in my code as they did it like

image_to_string doesn't work in Mac

折月煮酒 提交于 2019-12-07 03:34:13
问题 I'm trying to follow this example of pytesser (link) in a Mac Maverick. >>> from pytesser import * >>> im = Image.open('phototest.tif') >>> text = image_to_string(im) But, in the last line I get this error message: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pytesser.py", line 31, in image_to_string call_tesseract(scratch_image_name, scratch_text_name_root) File "pytesser.py", line 21, in call_tesseract proc = subprocess.Popen(args) File "/Library/Frameworks

Adding New Fonts to Tesseract 3

陌路散爱 提交于 2019-12-07 03:34:12
问题 I'm trying to add new fonts to tesseract ocr. I'm following this tutorial but I'm having some problems. Here's what I've done so far: Create training document convert eng.myfont.exp0.pdf eng.myfont.exp0.tif Train Tesseract tesseract eng.myfont.exp0.tif eng.myfont.exp0 batch.nochop makebox This created my eng.myfont.exp0.box file. I open the file with moshpytt and make sure it was detected correctly. Feed the box file back into tesseract tesseract eng.myfont.exp0.tif eng.myfont.exp0.box

Tesseract-ocr文字识别

纵然是瞬间 提交于 2019-12-07 00:05:15
当我浏览 http://code.google.com/p/tesseract-ocr 并下载了几个文件下来之后顿时感到一头雾水,不知该如何下手。网上看到有人在linux操作系统下的实现, 如: 利用开源程序(ImageMagick+tesseract-ocr)实现图像验证码识别 但却很少看到在windows下的相关文章介绍。 接下来我将一步步讲述如何采用tesseract-ocr识别含有中文的图片。 1、下载tesseract-ocr(注意3.0版本之后才支持中文的识别) tesseract-ocr-setup-3.00.exe chi_sim.traineddata.gz 2、安装tesseract-ocr 解压缩,双击 tesseract-ocr-setup-3.00.exe 即可根据提示一步步安装,本人安装的目录是:D:/Program Files/Tesseract-OCR 在该目录下可看到tesseract.exe文件,这就是我们后面程序中会调用到的运行进程。 3、自定义安装语言包 D:/Program Files/Tesseract-OCR目录下找到/tessdata目录,其是用来存放语言包,可把 chi_sim.traineddata.gz 解压缩之后的chi_sim.traineddata文件复制到该目录下即可。 4、编写测试代码 在编写代码之前下载两个jar包

Tesseract 3.0 with Tess4j crashing the Application on linux server

元气小坏坏 提交于 2019-12-06 22:47:27
I am using Tess4j 3.0.0 with Tesseract 3.04 in my java Application. In my application I've created a service for OCR which implements Runnable. Application is deployed in Centos 6 below code is in Service. Tesseract1 instance = new Tesseract1(); result = instance.doOCR("pathtodocument/abc.pdf"); I start a thread of OCR service from Document Upload Service on request from user and process the text data from PDF. When I test the code for single request it works perfect. Problem is : When I send more than one request at a time then whole application crashes. Below is the error in catalina.out # #

python人工智能-图像识别

纵然是瞬间 提交于 2019-12-06 16:15:51
一、安装库 首先我们需要安装PIL和pytesseract库。 PIL:(Python Imaging Library)是Python平台上的图像处理标准库,功能非常强大。 pytesseract:图像识别库。 我这里使用的是python3.6,PIL不支持python3所以使用如下命令 pip install pytesseract pip install pillow 如果是python2,则在命令行执行如下命令: pip install pytesseract pip install PIL 这时候我们去运行上面的代码会发现如下错误: 错误提示的很明显: No such file or directory :"tesseract" 这是因为我们没有安装tesseract-ocr引擎 二、tesseract-ocr引擎 光学字符识别(OCR,Optical Character Recognition)是指对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的过程。OCR技术非常专业,一般多是印刷、打印行业的从业人员使用,可以快速的将纸质资料转换为电子资料。关于中文OCR,目前国内水平较高的有清华文通、汉王、尚书,其产品各有千秋,价格不菲。国外OCR发展较早,像一些大公司,如IBM、微软、HP等,即使没有推出单独的OCR产品,但是他们的研发团队早已掌握核心技术

Recognizing numbers in an image in java

前提是你 提交于 2019-12-06 16:00:34
问题 I want to recognize numbers in the following image I am currently using Tess4J library in eclipse java project but it only recognizes the characters in a plane color background. For this image it could not even identify that there are characters(numbers) on this image. Help me find a way to accomplish this task. Here is my current code: import net.sourceforge.tess4j.*; import java.io.File; public class Main { public static void main(String[] args) { File imageFile = new File("image.png");

How to prepare image to recognize by tesseract OCR

放肆的年华 提交于 2019-12-06 15:49:35
I use Tesseract OCR to to extract meter reading... tesseract needs to recognize right white background and black numbers.. I tried to threshold image src := cvLoadImage(filename,CV_LOAD_IMAGE_GRAYSCALE); dst := cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 1); cvThreshold(src, dst, 50, 250, CV_THRESH_BINARY); but i didn't get the right result.. what should I do? I use deplhi6 with Delphi-OpenCV https://github.com/Laex/Delphi-OpenCV You can treat this image as follows: for jy:= 0 to bm.Height do for ix := 0 to bm.Width do begin cor:=bm.Canvas.Pixels[ix,jy]; R:=GetRValue(Cor); G:=GetGValue(Cor); B