Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null

丶灬走出姿态 提交于 2021-02-18 10:23:08

问题


Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs great but when running the full system which is a restFul API run by tomcat that receives the image and runs Tesseract it gives me the following error:

23:22:36.511 [http-nio-9999-exec-3] ERROR net.sourceforge.tess4j.Tesseract - null java.lang.NullPointerException: null at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:107) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48) at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:343) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197) at ocr.OcrUtil.getString(OcrUtil.java:54) at com.tapd.server.api.handlers.IRSHandler.uploadIRSImage(IRSHandler.java:65) at com.tapd.server.api.WebAPIService.updateParentIrsForm(WebAPIService.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:309) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:292) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1139) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:460) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Unknown Source) [2016-09-14 23:22:36,512] [ERROR] java.lang.NullPointerException

My guess is that the tessdata folder is not located in the right place and when packaged into a Jar and run by tomcat it is misplaced, but I couldn't figure out where it should be located and I have double checked to see that all Jars are deployed correctly.

Edit: so it appears that Tesseract can't handle the path when it is on a remote server such as AWS S3, so the question is why? and how can I allow it to use a path from S3? (yes the file is public)


回答1:


My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:

https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

106        } catch (GhostscriptException e) {
107            logger.error(e.getCause().toString(), e);
108        } finally {

In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.

(from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)

To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.




回答2:


As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.




回答3:


Resources I used: Windows 10 (tried on Windows Server 2016 as well), JAVA, MAVEN

Status: Working good on my local as well as VM 

1. Download  Tess4J-3.4.8  from here http://tess4j.sourceforge.net/  and set your ENV variable path under Advance System Setting 
2. Get repo from MAVEN - 

<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.7.0</version>
</dependency>

3. Get libtesseract302.dll and copy to "C:\Windows\System32" folder 
from here http://api.256file.com/libtesseract302.dll/en-download-56466.html
do not forget to set your ENV variable path under Advance System Setting  

4. Download and install Visual C++ 2015 Redistributable or VC++ 2017 Redistributable (I installed both )
from here https://programmer.help/blogs/net.sourceforge.tess4j.tesseractexception-java.lang.nullpointerexception.html 

then do restart your PC 

5. on Safer side can have some Jar files if you dont have already in local - Please see image

do not forget to set your ENV variable path for JARs under Advance System Setting 



来源:https://stackoverflow.com/questions/39504263/tesseract-error-net-sourceforge-tess4j-tesseract-null

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!