Tesseract OCR: Recognize complete dictionary words only

[亡魂溺海] 提交于 2019-12-05 06:49:02

问题


I'm using the tesseract OCR plugin for phonegap: https://github.com/jcesarmobile/PhonegapOCRPlugin/i

I'm trying to config tesseract to recognize complete dictionary words only. That is: no special characters, no suffixes or prefixes etc.

As the tessdata folder from this project doesn't contain any configs I thought I'd set configs on init. Right now I'm trying to set configs by modifying claseAuxiliar.mm but I can't say I've noticed any difference, this might be because the configs are wrong or that I'm setting them wrong. Below are my configs and how I'm currently trying to set them:

    // init the tesseract engine.
    tesseract = new tesseract::TessBaseAPI();
    tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
    if (!tesseract->SetVariable("segment_penalty_dict_nonword","10"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("segment_penalty_garbage","10"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("stopper_nondict_certainty_base","-100"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("language_model_penalty_non_dict_word","1"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("language_model_penalty_non_freq_dict_word","1"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("GARBAGE_STRING","5"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("NON_WERD","5"))
    printf("Setting variable failed!!!\n");

回答1:


You may want to try to suppress the system dictionary and load an alternative custom dictionary.

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc



来源:https://stackoverflow.com/questions/20599768/tesseract-ocr-recognize-complete-dictionary-words-only

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!