Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null

后端 未结 3 2186
醉话见心
醉话见心 2021-02-20 10:21

Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs grea

相关标签:
3条回答
  • 2021-02-20 10:27
    Resources I used: Windows 10 (tried on Windows Server 2016 as well), JAVA, MAVEN
    
    Status: Working good on my local as well as VM 
    
    1. Download  Tess4J-3.4.8  from here http://tess4j.sourceforge.net/  and set your ENV variable path under Advance System Setting 
    2. Get repo from MAVEN - 
    
    <dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.5.1</version>
    </dependency>
    <dependency>
    <groupId>org.ghost4j</groupId>
    <artifactId>ghost4j</artifactId>
    <version>1.0.1</version>
    </dependency>
    <dependency>
    <groupId>net.sourceforge.lept4j</groupId>
    <artifactId>lept4j</artifactId>
    <version>1.7.0</version>
    </dependency>
    
    3. Get libtesseract302.dll and copy to "C:\Windows\System32" folder 
    from here http://api.256file.com/libtesseract302.dll/en-download-56466.html
    do not forget to set your ENV variable path under Advance System Setting  
    
    4. Download and install Visual C++ 2015 Redistributable or VC++ 2017 Redistributable (I installed both )
    from here https://programmer.help/blogs/net.sourceforge.tess4j.tesseractexception-java.lang.nullpointerexception.html 
    
    then do restart your PC 
    
    5. on Safer side can have some Jar files if you dont have already in local - Please see image
    
    do not forget to set your ENV variable path for JARs under Advance System Setting 
    

    0 讨论(0)
  • 2021-02-20 10:37

    As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.

    0 讨论(0)
  • 2021-02-20 10:41

    My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:

    https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

    106        } catch (GhostscriptException e) {
    107            logger.error(e.getCause().toString(), e);
    108        } finally {
    

    In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.

    (from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)

    To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.

    0 讨论(0)
提交回复
热议问题