问题
I tried to use tess-two, a fork of Tesseract Tools for Android. I want to turn on hocr
output in tesseract, from this link, I tried to set variable tessedit_create_hocr
as true, but I can't see hocr in output. Here is my try:
baseApi.init(FileUtil.getAppFolder(), "eng", TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
baseApi.setVariable("tessedit_create_hocr", "1")
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
Somebody told the hocr
output should be in config folder or in folder contain image, but I don't see anything. Any I don't know how to config the file name and location of hocr output.
Another thing: is there any way to apply config file into Tesseract Tools for Android? I put the config files into tessdata/config folder, but there is nothing happen. How to tell tesseract
should read these config files? Seem they don't have enough documents for android.
Update: Thanks to @nguyenq
, now I can get HOCR
data. Here is my try:
jstring Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetHOCRText(JNIEnv *env,
jobject thiz, jint page) {
native_data_t *nat = get_native_data(env, thiz);
char *text = nat->api.GetHOCRText(page);
jstring result = env->NewStringUTF(text);
free(text);
return result;
}
回答1:
Apparently, tess-two
does not implement all the TessBaseAPI
as it does not include support for the native GetHOCRText
method. You may have to extend the wrapper yourself to access the functions you need.
The config files are meant for command-line execution. Alternatively, you can set the necessary variables through the exposed API method setVariable
.
来源:https://stackoverflow.com/questions/21248288/export-hocr-output-for-tesseract-ocr-in-android