pytesseract error Windows Error [Error 2]

妖精的绣舞 提交于 2019-12-23 20:41:51

问题


Hi I am trying the python library pytesseract to extract text from image. Please find the code:

from PIL import Image
from pytesseract import image_to_string
print image_to_string(Image.open(r'D:\new_folder\img.png'))

But the following error came:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

I did not found a specific solution to this. Can anyone help me what to do. Anything more to be downloaded or from where i can download it etc..

Thanks in advance :)


回答1:


I had the same trouble and quickly found the solution after reading this post:

OSError: [Errno 2] No such file or directory using pytesser

Just need to adapt it to Windows, replace the following code:

tesseract_cmd = 'tesseract'

with:

tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

(need double \\ to escape first \ in the string)




回答2:


You're getting exception because subprocess isn't able to find the binaries (tesser executable).

The installation is a 3 step process:

1.Download/Install system level libs/binaries:

For various OS here's the help. For MacOS you can directly install it using brew.

Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows). You must be able to invoke the tesseract command as tesseract. If this isn’t the case, for example because tesseract isn’t in your PATH, you will have to change the “tesseract_cmd” variable at the top of tesseract.py. Under Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users. please install homebrew package tesseract.

For Windows:

An installer for the old version 3.02 is available for Windows from our download page. This includes the English training data. If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract-OCR\tessdata.

To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables, probably C:\Program Files\Tesseract-OCR.

Can download the .exe from here.


2.Install Python package

pip install pytesseract

3.Finally, you need to have tesseract binary in you PATH.

Or, you can set it at run-time:

import pytesseract

pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>'

For Windows:

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
  • The above line will make it work temporarily, for permanent solution add the tesseract.exe to the PATH - such as PATH=%PATH%;"C:\Program Files (x86)\Tesseract-OCR".

  • Beside that make sure that TESSDATA_PREFIX Windows environment variable is set to the directory, containing tessdata directory. For example:

    TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR

i.e. tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata


Your example:

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
print pytesseract.image_to_string(Image.open(r'D:\new_folder\img.png'))



回答3:


You need Tesseract OCR engine ("Tesseract.exe") installed in your machine. If the path is not configured in your machine, provide complete path in pytesseract.py(tesseract.py).

README

Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows). You must be able to invoke the tesseract command as tesseract. If this isn't the case, for example because tesseract isn't in your PATH, you will have to change the "tesseract_cmd" variable at the top of tesseract.py. Under Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users. please install homebrew package tesseract.

Another thread




回答4:


I have also faced the same problem regarding pytesseract. I would suggest you to work in linux environment, to solve such errors. Do the following commands in linux:

pip install pytesseract
sudo apt-get update
sudo apt-get install pytesseract-ocr

Hope this will do the work..



来源:https://stackoverflow.com/questions/41652335/pytesseract-error-windows-error-error-2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!