问题
Consider this small png image depicting the word 'Account' in black on a white background:
For this ground-truth image the output differs between the following two Tesseract command-line operations, with (A) better than (B). (B) is required in order for me as the user to be have any hope of controlling Tesseract's some 660 configuration parameters - but at (A)'s extraction performance.
Case A (no config file):
tesseract -v test.png test
tesseract 4.1.0
leptonica-1.78.0
libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found SSE
Tesseract Open Source OCR Engine v4.1.0 with Leptonica
cat test.txt
Account
^L
Case B (using config file, which is obviously desirable):
tesseract --print-parameters > tess_default.cfg
tesseract -v test.png test test_default.cfg
ccot
^L Page separator (default is form feed control character)
I believe the output should be the same in both cases, but it is not. Why? Case A is clearly more accurate in its output, but Case B is not.
How does one otherwise discover the current configuration of Tesseract if not using --print-parameters
?
Please consider only the Tesseract command line under *nix - no python, Java SDKs etc. on this occasion.
Thanks!
- Tesseract Version: 4.1.0
- Commit Number: [executed: brew install tesseract]
- Platform: macOS High Sierra 10.13.6 / Darwin redacted.office 17.7.0 Darwin Kernel Version 17.7.0: Sun Jun 2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
来源:https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used