Symbol lookup error while using Tesseract

北战南征 提交于 2019-12-14 04:16:46

问题


I've been using Tesseract 4, for a project for more than two months now. (This means that it's running on input images for more than two months.) The problem that I'm shown is:

multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/cse/.local/lib/python3.5/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "UKExtraction2.py", line 267, in tessBox
    op = pt.image_to_string(box[0],lang='hin+eng',config='--psm 6')
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 286, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 194, in run_and_get_output
    run_tesseract(**kwargs)
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "UKExtraction2.py", line 855, in <module>
    doItAllUpper("A0","UK4.csv","temp",27,70,"box",2,1000,firstPageCoordsUK,boxCoordUK,voterBoxCoordUK,internalBoxNumberCoordUK,externalBoxNumberCoordUK,addListInfoUK)
  File "UKExtraction2.py", line 776, in doItAllUpper
    doItAll(tempPDFName,outputCSV,2,pdfs,formatType,n_blocks,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,pdfName)           
  File "UKExtraction2.py", line 617, in doItAll
    mainProcess(pdfName,(0,noOfPages-1),formatType,n_blocks,outputCSV,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,bigPDFName,basePages)
  File "UKExtraction2.py", line 563, in mainProcess
    names_lst = cropAndOCR(im,(tup[0],tup[1]),formatType,boxCoord,voterBoxCoord,externalBoxNumberCoord,n_blocks,basePages)# Add the values of fpageInfo
  File "UKExtraction2.py", line 416, in cropAndOCR
    results = pool.map(tessBox,box_lst_divided)
  File "/home/cse/.local/lib/python3.5/site-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 644, in get
    raise self._value
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')

The pathos part is because of the fact that the project uses two threads to work. The important part is:

pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')

A user posted for this error on the tesseract-ocr google mailing group:

combine_tessdata: symbol lookup error: combine_tessdata: undefined symbol: _Z7tprintfPKcz

And got the answer that

"undefined symbol" indicate a broken installation

But as I said, this version is running without any errors for more than two months, so there shouldn't be any problems with the tesseract installation.

Another user posted the same problem at the group, but no one replied.

So, I assumed that the problem can be at two places:

  1. In the image provided to tesseract.
  2. Inside tesseract.

The image might not be an image altogether! That is, it might have 0x0 dimensions (though that isn't possible given the construction process of the image). But that is not possible, because the error I got was:

SystemError: tile cannot extend outside image

When I tried my hypothesis.

This means, that the image was present, so tesseract should have worked.

This also means that the problem is inside Tesseract. I'm no expert at tesseract's inner workings, but given the fact that this version worked until now correctly and there is no problem with the input image, what could be the problem with Tesseract?

P.S: I'm currently not near the system that runs the script, but I do know of the error that occurred. I might not be able to give exact details about the system, therefore I expect hypothesis for the problem.

P.S: The script is here.


回答1:


Here is the solution for ubuntu 18.04

Please first install the libraries which are required for tesseract-ocr

sudo apt install libtesseract-dev libleptonica-dev liblept5

Then simply install tesseract using command

sudo apt install tesseract-ocr -y



回答2:


Posted as an answer instead of a comment to be able to make modifications.

On Debian GNU/Linux 9.6 (stretch) (also worked on 9.9) as of June 2019.

When tesseract stopped working "all of a sudden", I had to

sudo apt-get purge libtesseract4 tesseract-ocr

and then reinstall them again (via the backport since there were not available in the stable channel) :

sudo apt-get install -t stretch-backports tesseract-ocr 

So essential thing in my case was to resinstall libtesseract4 otherwise the symbol lookup error: tesseract: undefined symbol kept showing.



来源:https://stackoverflow.com/questions/52464159/symbol-lookup-error-while-using-tesseract

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!