python-tesseract | 易学教程

Tesseract 3.x multiprocessing weird behaviour

阅读更多关于 Tesseract 3.x multiprocessing weird behaviour

问题 I am not sure whether it is my infrastucture that does this weird stuff or the tesseract-ocr itself. Whenever i use image_to_stirng in single-process environment - the tesseract-ocr works fine. But when I spawn multiple workers with gunicorn and all of them get to do some work with ocr reading - the tesseract-ocr starts reading very poorly (and not from performance-vise, but accuracy-vise). Even after the load is done - tesseract never has the same accuracy. I need to restart all the workers

How to use trained data with pytesseract?

阅读更多关于 How to use trained data with pytesseract?

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL import Image import pytesseract as tes results = tes.image_to_string(Image.open('./test.jpg'),boxes=True) file = open('parsing.text','a') file.write(results) print(results) How to I use my traineddata file so I'm able to read new font with the python script ? thanks ! edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line

Empty string with Tesseract

阅读更多关于 Empty string with Tesseract

问题 I'm trying to read different cropped images from a big file and I manage to read most of them but there are some of them which return an empty string when I try to read them with tesseract. The code is just this line: pytesseract.image_to_string(cv2.imread("img.png"), lang="eng") Is there anything I can try to be able to read these kind of images? Thanks in advance Edit: 回答1: Thresholding the image before passing it to pytesseract increases the accuracy. import cv2 import numpy as np #

How to save dpi info in py-opencv?

阅读更多关于 How to save dpi info in py-opencv?

import cv2 def clear(img): back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE) img = cv2.bitwise_xor(img, back) ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) return img def threshold(img): ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY) return img def fomatImage(img): img = threshold(img) img = clear(img) return img img = fomatImage(cv2.imread("1566135246468.png",cv2.IMREAD_COLOR)) cv2.imwrite("aa.png",img) This is my code. But when I tried to identify it with

UnicodeDecodeError with Tesseract OCR in Python

阅读更多关于 UnicodeDecodeError with Tesseract OCR in Python

问题 Iam trying to extract text from an image file using Tesseract OCR in Python but I'am facing an Error that i can figure out how to deal with it. all my environment is good as i tested some sample image with the ocr in python! here is the code from PIL import Image import pytesseract strs = pytesseract.image_to_string(Image.open('binarized_image.png')) print (strs) the follow is the error I get from eclipse console strs = pytesseract.image_to_string(Image.open('binarized_body.png')) File "C:

error while trying to install tesserocr

阅读更多关于 error while trying to install tesserocr

问题 I keep getting the same error when I try to install (env) vagrant@vagrant:~$ pip install tesserocr Collecting tesserocr Using cached tesserocr-2.1.3.tar.gz Building wheels for collected packages: tesserocr Running setup.py bdist_wheel for tesserocr ... error Complete output from command /home/vagrant/src/env/bin/python2 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-4K2D6A/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close(

PyTesseract call working very slow when used along with multiprocessing

阅读更多关于 PyTesseract call working very slow when used along with multiprocessing

I've a function that takes in a list of images and produces the output, in a list, after applying OCR to the image. I have an another function that controls the input to this function, by using multiprocessing. So, when I have a single list (i.e. no multiprocessing), each image of the list took ~ 1s, but when I increased the lists that had to be processed parallely to 4, each image took an astounding 13s. To understand where the problem really is, I tried to create a minimal working example of the problem. Here I have two functions eat25 and eat100 which open an image name and feed it to the

How to reduce wand memory usage?

阅读更多关于 How to reduce wand memory usage?

问题 I am using wand and pytesseract to get the text of pdfs uploaded to a django website like so: image_pdf = Image(blob=read_pdf_file, resolution=300) image_png = image_pdf.convert('png') req_image = [] final_text = [] for img in image_png.sequence: img_page = Image(image=img) req_image.append(img_page.make_blob('png')) for img in req_image: txt = pytesseract.image_to_string(PI.open(io.BytesIO(img)).convert('RGB')) final_text.append(txt) return " ".join(final_text) I have it running in celery in

How to Create Traineddata file For Tesseract 4.1.0

阅读更多关于 How to Create Traineddata file For Tesseract 4.1.0

I want to recognise the characters of NumberPlate. How to train the tesseract-ocr for respective number plate in ubuntu 16.04. Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate. I have 1000 images of number plate. Please look into it. Any help would be appreciate. So I have tried the following commands tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox But it gives error. Tesseract Open Source OCR Engine

how to avoid Permission denied while installing package for Python without sudo

阅读更多关于 how to avoid Permission denied while installing package for Python without sudo

I am trying to install the tesseract wrapper for python as user mike so that I can import tesseract . I'm following the guide here https://code.google.com/p/python-tesseract/wiki/HowToCompilePythonTesseractForCentos However, when I execute python setup.py install I get the error below: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/test-easy-install-7351.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /usr/local/lib/python2.7/site-packages/ I do have sudo access but here is the problem: When I