“language_model_penalty_non_dict_word” has no effect in tesseract 3.01

流过昼夜 提交于 2019-12-06 03:38:44

问题


I'm setting language_model_penalty_non_dict_word through a config file for Tesseract 3.01, but its value doesn't have any effect. I've tried with multiple images, and multiple values for it, but the output for each image is always the same. Another user has noticed the same in a comment in another question.

Edit: After looking inside the source, the variable language_model_penalty_non_dict_word is used only inside the function float LanguageModel::ComputeAdjustedPathCost.

However, this function is never called! It is referenced only by 2 functions - LanguageModel::UpdateBestChoice() and LanguageModel::AddViterbiStateEntry(). I placed breakpoints in those functions, but they weren't being called, as well.


回答1:


After some debugging, I finally found out the reason - the function Wordrec::SegSearch() wasn't being called (and it is up there in the call graph of LanguageModel::ComputeAdjustedPathCost()).

From this code:

  if (enable_new_segsearch) {
    SegSearch(&chunks_record, word->best_choice,
              best_char_choices, word->raw_choice, state);
  } else {
    best_first_search(&chunks_record, best_char_choices, word,
                      state, fixpt, best_state);
  }

So you need to set enable_new_segsearch in the config file:

enable_new_segsearch    1

language_model_penalty_non_freq_dict_word 0.2
language_model_penalty_non_dict_word 0.3


来源:https://stackoverflow.com/questions/29826591/language-model-penalty-non-dict-word-has-no-effect-in-tesseract-3-01

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!