问题
I write a snippet using ctypes
and tesseract 3.0.2
referring to the example:
import ctypes
from PIL import Image
libname = '/opt/tesseract/lib/libtesseract.so.3.0.2'
tesseract = ctypes.cdll.LoadLibrary(libname)
api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, "", 'eng')
filename = '/opt/ddl.ddl.exp654.png'
text_out = tesseract.TessBaseAPIProcessPages(api, filename, None, 0)
result_text = ctypes.string_at(text_out)
print result_text
It passes filename as a parameter, I have no idea to call which method in API to pass the raw data like:
tesseract.TessBaseAPIWhichMethod(api, open(filename).read())
回答1:
I can't say for sure but I don't think you can pass complex python objects to that specific API, it won't know how to handle them. Your best bet would to be to look at a wrapper like http://code.google.com/p/python-tesseract/ which will allow you to use file buffers
import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)
mImgFile = "eurotext.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) #YAY for buffers.
print "result(ProcessPagesBuffer)=",result
Edit
http://code.google.com/p/python-tesseract/source/browse/python-tesseract-0.7.4/debian/python-tesseract/usr/share/pyshared/tesseract.py might provide you with the insight that you need.
...
Acutally if you don't mind what happens when you replace
text_out = tesseract.TessBaseAPIProcessPages(api, filename, None, 0)
with
text_out = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
来源:https://stackoverflow.com/questions/13150937/how-to-recognize-data-not-filename-using-ctypes-and-tesseract-3-0-2