Character confidence for Tesseract 3.02 using config file

荒凉一梦 提交于 2020-05-14 12:45:30

问题


How would I get the % confidence per character detected? By searching around I found that you should set save_blob_choices to T. So I added that to as a line in the hocr config file in tessdata/configs and called tesseract with it. This is all I'm getting in the generated html file:

<span class='ocr_line' id='line_1' title="bbox 0 0 50 17"><span class='ocrx_word' id='word_1' title="bbox 3 2 45 15"><strong>31,835</strong></span>

As you can see there isn't any confidence annotations not even per word.

I don't have visual studio so I'm not able to make any code changes. But I'm also open to answers describing code changes as well as how I would compile the code without VS.


回答1:


Here is the sample code of getting confidence of each word. You can even replace RIL_WORD with RIL_SYMBOL to get confidence of each character.

mTess.Recognize(0);
tesseract::ResultIterator* ri = mTess.GetIterator();
if(ri != 0)
{
    do
    {
        const char* word = ri->GetUTF8Text(tesseract::RIL_WORD);
        if(word != 0 )
        {
            float conf = ri->Confidence(tesseract::RIL_WORD);
            printf("  word:%s, confidence: %f", word, conf );
        }
        delete[] word;
    } while((ri->Next(tesseract::RIL_WORD)));

    delete ri;
}



回答2:


You will have to write a program to do this. Take a look at the ResultIterator API example at Tesseract site. For your case, be sure to set save_blob_choices variable and iterate at RIL_SYMBOL level.



来源:https://stackoverflow.com/questions/17393555/character-confidence-for-tesseract-3-02-using-config-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!