问题
I'm bumping into this error that's driving me a little bit crazy with the python wrapper for tesseract which is a python module called tesseract.
Here's the python code I am trying to run :
img = cv2.imread(image, 0)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)
tesseract.SetCvImage(img,api)
url = api.GetUTF8Text()
conf=api.MeanTextConf()
print('Extracted URL : ' + url)
api.End()
and this is what I get:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
I don't understand why it is doing this since I have the TESSDATA_PREFIX env variable correctly set to the correct path to my tesseract installation (with the trailing slash).
When I try to run Tesseract directly from powershell (I'm on windows 7 btw), by doing:
tesseract.exe .\data\test.tif -psm 7 out
it works like a charm ! Also when I call Tesseract with Popen in my python script it works fine but I don't like the idea of me not being able to grab the OCR'd text directly from stdout. Indeed, there seems to be no other choice than providing Tesseract with an output filename and then to fopen and read from that file. I feel it's going to be pretty awful to deal with temporary text files just to get the output of the OCR...
Help?
回答1:
The first parameter to api.Init
should be TESSDATA_PREFIX.
回答2:
get location of ur tessdata folder by typing in command prompt:
$ brew list tesseract
in may case:
/usr/local/Cellar/tesseract/3.05.01/bin/tesseract
/usr/local/Cellar/tesseract/3.05.01/include/tesseract/ (27 files)
/usr/local/Cellar/tesseract/3.05.01/lib/libtesseract.3.dylib
/usr/local/Cellar/tesseract/3.05.01/lib/pkgconfig/tesseract.pc
/usr/local/Cellar/tesseract/3.05.01/lib/ (2 other files)
/usr/local/Cellar/tesseract/3.05.01/share/man/ (11 files)
/usr/local/Cellar/tesseract/3.05.01/share/tessdata/ (28 files)
now
tessdata_dir_config = r'--tessdata-dir "/usr/local/Cellar/tesseract/3.05.01/share/tessdata"'
txt= image_to_string(img,lang='eng',config=tessdata_dir_config)
来源:https://stackoverflow.com/questions/24672531/annoying-python-tesseract-error-error-opening-data-file-tessdata-eng-trainedda