Strength of Dictionary in Tesseract 3

前端 未结 2 2316
独厮守ぢ
独厮守ぢ 2021-02-20 14:19

How do I increase/decrease the strength of the dictionary in tesseract 3 ?

In the FAQ it says I need to change the value of \"NON_WERD\" and \"GARBAGE_STRING\" but they

相关标签:
2条回答
  • 2021-02-20 14:42

    According to http://code.google.com/p/tesseract-ocr/wiki/FAQ, you change these variables:

    enable_new_segsearch    1
    language_model_penalty_non_freq_dict_word 0.2
    language_model_penalty_non_dict_word 0.3
    

    Increase their values to make Tesseract more biased to dictionary words.

    Note: You must set enable_new_segsearch, otherwise they'll have no effect.

    0 讨论(0)
  • 2021-02-20 14:57

    To turn tesseract's language-knowing abilities entirely, run each of these:

    tess.setTessVariable("load_system_dawg", "false");
    tess.setTessVariable("load_freq_dawg", "false");
    tess.setTessVariable("load_punc_dawg", "false");
    tess.setTessVariable("load_number_dawg", "false");
    tess.setTessVariable("load_unambig_dawg", "false");
    tess.setTessVariable("load_bigram_dawg", "false");
    tess.setTessVariable("load_fixed_length_dawgs", "false");
    

    Or, for finer control, just some of them. (I don't know of a place explaining well what they all do, but the names are pretty explanatory) This is code from my current project, using Tess4J, but you can easily translate them to c++ or a config file or whatever else you need.

    0 讨论(0)
提交回复
热议问题