Disable dictionary in Tesseract

∥☆過路亽.° 提交于 2019-12-29 04:26:08

问题


How can I disable dictionary corrections when running Tesseract for English language?

I'm currently running tesseract as a child process.


回答1:


Try to set these variables (put them in a config file) to false:

load_system_dawg 
load_freq_dawg
load_punc_dawg
load_number_dawg
load_unambig_dawg
load_bigram_dawg
load_fixed_length_dawgs

https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/Disable$20dictionary$20in$20Tesseract/tesseract-ocr/5nvIo1DJxHE/f3gBi2pTKykJ

Also read How to increase the trust in/strength of the dictionary? in the FAQ. From it:

For tesseract-ocr < 3.01 try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or even 5.

For tesseract-ocr >= 3.01 try increasing the variables language_model_penalty_non_freq_dict_word and language_model_penalty_non_dict_word in a config file. By default they are 0.1 and 0.15 respectively.



来源:https://stackoverflow.com/questions/14364662/disable-dictionary-in-tesseract

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!