python-tesseract

Tesseract 3.x multiprocessing weird behaviour

浪尽此生 提交于 2019-12-04 12:28:56
问题 I am not sure whether it is my infrastucture that does this weird stuff or the tesseract-ocr itself. Whenever i use image_to_stirng in single-process environment - the tesseract-ocr works fine. But when I spawn multiple workers with gunicorn and all of them get to do some work with ocr reading - the tesseract-ocr starts reading very poorly (and not from performance-vise, but accuracy-vise). Even after the load is done - tesseract never has the same accuracy. I need to restart all the workers

How to use trained data with pytesseract?

北城余情 提交于 2019-12-04 10:27:21
Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL import Image import pytesseract as tes results = tes.image_to_string(Image.open('./test.jpg'),boxes=True) file = open('parsing.text','a') file.write(results) print(results) How to I use my traineddata file so I'm able to read new font with the python script ? thanks ! edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line

Empty string with Tesseract

╄→尐↘猪︶ㄣ 提交于 2019-12-04 05:20:02
问题 I'm trying to read different cropped images from a big file and I manage to read most of them but there are some of them which return an empty string when I try to read them with tesseract. The code is just this line: pytesseract.image_to_string(cv2.imread("img.png"), lang="eng") Is there anything I can try to be able to read these kind of images? Thanks in advance Edit: 回答1: Thresholding the image before passing it to pytesseract increases the accuracy. import cv2 import numpy as np #

How to save dpi info in py-opencv?

北战南征 提交于 2019-12-02 07:53:08
import cv2 def clear(img): back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE) img = cv2.bitwise_xor(img, back) ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) return img def threshold(img): ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY) return img def fomatImage(img): img = threshold(img) img = clear(img) return img img = fomatImage(cv2.imread("1566135246468.png",cv2.IMREAD_COLOR)) cv2.imwrite("aa.png",img) This is my code. But when I tried to identify it with

UnicodeDecodeError with Tesseract OCR in Python

只愿长相守 提交于 2019-12-01 16:54:09
问题 Iam trying to extract text from an image file using Tesseract OCR in Python but I'am facing an Error that i can figure out how to deal with it. all my environment is good as i tested some sample image with the ocr in python! here is the code from PIL import Image import pytesseract strs = pytesseract.image_to_string(Image.open('binarized_image.png')) print (strs) the follow is the error I get from eclipse console strs = pytesseract.image_to_string(Image.open('binarized_body.png')) File "C:

error while trying to install tesserocr

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-01 12:32:41
问题 I keep getting the same error when I try to install (env) vagrant@vagrant:~$ pip install tesserocr Collecting tesserocr Using cached tesserocr-2.1.3.tar.gz Building wheels for collected packages: tesserocr Running setup.py bdist_wheel for tesserocr ... error Complete output from command /home/vagrant/src/env/bin/python2 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-4K2D6A/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close(

PyTesseract call working very slow when used along with multiprocessing

不想你离开。 提交于 2019-12-01 11:50:04
I've a function that takes in a list of images and produces the output, in a list, after applying OCR to the image. I have an another function that controls the input to this function, by using multiprocessing. So, when I have a single list (i.e. no multiprocessing), each image of the list took ~ 1s, but when I increased the lists that had to be processed parallely to 4, each image took an astounding 13s. To understand where the problem really is, I tried to create a minimal working example of the problem. Here I have two functions eat25 and eat100 which open an image name and feed it to the

How to reduce wand memory usage?

夙愿已清 提交于 2019-12-01 06:58:54
问题 I am using wand and pytesseract to get the text of pdfs uploaded to a django website like so: image_pdf = Image(blob=read_pdf_file, resolution=300) image_png = image_pdf.convert('png') req_image = [] final_text = [] for img in image_png.sequence: img_page = Image(image=img) req_image.append(img_page.make_blob('png')) for img in req_image: txt = pytesseract.image_to_string(PI.open(io.BytesIO(img)).convert('RGB')) final_text.append(txt) return " ".join(final_text) I have it running in celery in

How to Create Traineddata file For Tesseract 4.1.0

空扰寡人 提交于 2019-12-01 00:52:53
I want to recognise the characters of NumberPlate. How to train the tesseract-ocr for respective number plate in ubuntu 16.04. Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate. I have 1000 images of number plate. Please look into it. Any help would be appreciate. So I have tried the following commands tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox But it gives error. Tesseract Open Source OCR Engine

how to avoid Permission denied while installing package for Python without sudo

為{幸葍}努か 提交于 2019-11-30 19:09:47
I am trying to install the tesseract wrapper for python as user mike so that I can import tesseract . I'm following the guide here https://code.google.com/p/python-tesseract/wiki/HowToCompilePythonTesseractForCentos However, when I execute python setup.py install I get the error below: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/test-easy-install-7351.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /usr/local/lib/python2.7/site-packages/ I do have sudo access but here is the problem: When I