get the exact position of text from image in tesseract

好久不见. 提交于 2019-12-03 05:59:27

问题


Using GetHOCRText(0) method in tesseract I'm able to retrieve the text in html and on presenting the html in webview i'm able get the text but the postion of text in image is different from the output. Any idea is highly helpful.

 tesseract->SetInputName("word");
tesseract->SetOutputName("xyz");
tesseract->Recognize(NULL);


char *utf8Text=tesseract->GetHOCRText(0);

and output image


回答1:


GetBoxText() method will return exact position of each characters in an array.

char *boxtext = _tesseract->GetBoxText(0);
NSString* aBoxText = [NSString stringWithUTF8String:boxtext];



回答2:


If you have the hocr output, you should have a tag for each word. These tags should have class="ocrx_word" and name="bbox x1 y1 x2 y2" where the x and y are the top left and bottom right corner of the bounding box around the word. I don't think it's possible to automatically use this information to format a text document - would require translating pixel differences to number of tabs/spaces. But, you should be able to render text in the given location.



来源:https://stackoverflow.com/questions/12278982/get-the-exact-position-of-text-from-image-in-tesseract

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!