tesseract

How can I detect boxes in an image and pull them out as individual files?

一世执手 提交于 2019-12-21 21:39:23
问题 I need a programmatic way of taking a scanned image (let's assume PNG or any other convenient image format) and breaking it up into many smaller images. The scanned image is a grid, and the boxes of the grid will always be the same size and in the same relative location. Because the image is scanned, they are not necessarily in the same absolute location. In each box is a character, ideally I'd like to save the character as its own image file, without any of the box border. I prefer PHP and

get Font Size in Python with Tesseract and Pyocr

三世轮回 提交于 2019-12-21 21:18:27
问题 Is it possible to get font size from an image using pyocr or Tesseract ? Below is my code. tools = pyocr.get_available_tools() tool = tools[0] txt = tool.image_to_string( Imagee.open(io.BytesIO(req_image)), lang=lang, builder=pyocr.builders.TextBuilder() ) Here i get text from image using function image_to_string . And now, my question is, if i can get font-size (number) too of my text. 回答1: Using tesserocr, you can get a ResultIterator after calling Recognize on your image, for which you can

Using Tesseract OCR in VC++

随声附和 提交于 2019-12-21 20:58:35
问题 In my project I have to read the numbers from the image(.jpg or .tiff). After googling a lot, I came to know about the open OCR i.e., Tesseract OCR. Am begginer for Tesseract OCR, I read all the documentation of tesseract & how to use it in Visual studio. Bascically am facing some problem in using tesseract... I followed the steps like this: 1) Downloaded & Installed tesseract-ocr-setup-3.02.02.exe from http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-setup-3.02.02

Can not use ChoiceIterator in tesseract

只愿长相守 提交于 2019-12-21 20:55:45
问题 First of all i want to confirm that i understand choice iterator right. For example if i have a word on an image "scope", choice iterator must give me something like "s" and maybe after Next(), "5". for 3. letter "o" it maybe gives me "0", after Next() "O" and after Next() "o". Do i understand right? Here is all my related code, api.SetImage((uchar*)img->imageData,img->width,img->height,img->depth/8,img->widthStep); api.SetRectangle(0,0,img->width, img->height); int left,top,right,bottom;

How to use trained data with pytesseract?

时间秒杀一切 提交于 2019-12-21 19:44:16
问题 Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL import Image import pytesseract as tes results = tes.image_to_string(Image.open('./test.jpg'),boxes=True) file = open('parsing.text','a') file.write(results) print(results) How to I use my traineddata file so I'm able to read new font with the python script

How to cross compile tesseract ocr engine for iphone?

左心房为你撑大大i 提交于 2019-12-21 06:16:39
问题 I am struggling from past 1 week that how to compile the tesseract ocr enging for iphone. I have gon through through some link, But i couldn't find the proper way. Can anyone help me through step by step procedure. Thanks in advance. 回答1: That probably won't be enough. I know nothing about Tesseract OCR library but you will require the include directives, plus specify the directory to where the Tesseract header files are installed via a compiler switch (usually -I) and (possibly) link with

Does Tesseract neglect any nontext area in a scanned document?

纵饮孤独 提交于 2019-12-21 05:57:10
问题 I'm using Tesseract but I don't know whether it neglects any nontext area and targets text only. Do I have to remove any nontext area as a preprocessing step for better output? 回答1: Tesseract has a pretty good algorithm to detect text, but it will eventually give false-positive matches. Ideally, you would pre-process the image before submitting it to tesseract. Some time ago I engaged in a similar task, so I suggest you take a look at the following material: OpenCV C++/Obj-C: Detecting a

Does Tesseract neglect any nontext area in a scanned document?

此生再无相见时 提交于 2019-12-21 05:57:03
问题 I'm using Tesseract but I don't know whether it neglects any nontext area and targets text only. Do I have to remove any nontext area as a preprocessing step for better output? 回答1: Tesseract has a pretty good algorithm to detect text, but it will eventually give false-positive matches. Ideally, you would pre-process the image before submitting it to tesseract. Some time ago I engaged in a similar task, so I suggest you take a look at the following material: OpenCV C++/Obj-C: Detecting a

Tesseract .NET Process image from memory object

允我心安 提交于 2019-12-21 05:35:15
问题 From what I understand (I could be wrong) Pix.LoadFromFile is the only way to get Pix for processing. is there any other way, such as from a bitmap? 回答1: I am not professional in tesseract, but you can use the following: Bitmap bmp = (Bitmap)Bitmap.FromFile(MyImgFilePath); Pix img = PixConverter.ToPix(bmp); you can take a look at source code of PixConverter at : https://github.com/charlesw/tesseract/blob/master/src/Tesseract/PixConverter.cs 来源: https://stackoverflow.com/questions/26162169

Android OCR using tess-two a fork of tesseract

馋奶兔 提交于 2019-12-21 02:54:20
问题 Am using OCR as a module in a project that am doing. After digging in deep for a week i thought that i should run a test application on eclipse just to see how accurately it works. i found tess-two a fork of tesseract as a support to my OCR. i downloaded tess-two from: https://github.com/rmtheis/tess-two/downloads I was all set i imported tess-two into my eclipse. I did set my eclipse for handling and building projects involving native code. I did build tess-two successfully after solving 1