I am using tess4j, the java wrapper of Tesseract. I also have the normal Tesseract installed. I am not exactly sure how tess4j is meant to work, but since it comes with a te
Maybe you haven't the tessdata
folder in your main project folder.
This folder has all tesseract supported language (it contains files with .traineddata
, .bigrams
, .fold
, .lm
, .nn
, .params
, .size
and .word-freq
extensions)
If you don't have it, follow these steps:
tessdata-master.zip
file in your main project foldertessdata-master
to tessdata
For those that use maven and don't like to use global variables, this works for me:
File imageFile = new File("C:\\random.png");
Tesseract instance = Tesseract.getInstance();
//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");
//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());
try {
String result = instance.doOCR(imageFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
found here, tested with maven -> net.sourceforge.tess4j:tess4j:3.4.1, also the link use 1.4.1 jar
TESSDATA_PREFIX
environment variable, if defined, will overrule everything, including that is set by init
or setDatapath
; but that may change in the near future when an application can specify where its tessdata
folder is.
http://code.google.com/p/tesseract-ocr/issues/detail?id=938
https://groups.google.com/forum/#!topic/tesseract-ocr/bkJwI8WmxSw
Let your TESSDATA_PREFIX environment variable
point to the tessdata folder of your Tess4j.
Usually you set up these variable during an installation on the system, but you maybe find a solution here: How do I set environment variables from Java?
You have to do it on the system which runs your app because the tessdata .dll
s depend on this enviroment variable.