TikaException: Failed to close temporary resource - how to fix?

↘锁芯ラ 提交于 2021-01-29 07:50:40

问题


I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika using Maven with the following dependencies:

<dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-parsers</artifactId>
      <version>1.21</version>
    </dependency>
</dependencies>

I have the code below for performing OCR using Tesseract (which I have independently tested and know to be working):

public static void OCRTest() {
        try {
            BufferedImage im = ImageIO.read(new File(OCR_IMAGE));
            TesseractOCRConfig config = new TesseractOCRConfig();
            config.setTessdataPath("C:\\Program Files\\Tesseract-OCR\\tessdata");
            config.setTesseractPath("C:\\Program Files\\Tesseract-OCR");
            ParseContext parseContext = new ParseContext();
            parseContext.set(TesseractOCRConfig.class, config);
            TesseractOCRParser parser = new TesseractOCRParser();
            BodyContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            try {
                parser.parse(im, handler, metadata, parseContext);
                System.out.println(handler.toString());
            } catch (SAXException e) {
                e.printStackTrace();
            } catch (TikaException e) {
                e.printStackTrace();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

I run into the following exception:

org.apache.tika.exception.TikaException: Failed to close temporary resources
    at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
    at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251)
    at test.test.App.OCRTest(App.java:46)
    at test.test.App.main(App.java:30)
Caused by: java.nio.file.FileSystemException: C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp: The process cannot access the file because it is being used by another process.

The tmp file is present in the Temp folder, and the exception seemed to come from not being able to delete it. On the Apache Tika forums, there is a post where someone else has run into the same exception, although with the AutoDetectParser and not Tesseract. Their issue seemed to be a conflict in their imported jars, but I run into this issue even with only the Apache Tika libraries installed.

I don't run into this issue when using the Tika's AutoDetectParser, only with the TesseractOCRParser. Any insights on how to fix the exception would be appreciated!


回答1:


I posted on the Apache Tika issues forum (https://issues.apache.org/jira/browse/TIKA-2908). The issue came from the order the TesseractOCRParser was closing the open streams - you can see the changes made here: https://github.com/apache/tika/commit/8d386f827eb31e7f1cb189ce942c67a84a0c6bdc?diff=unified#diff-592f390e7558bb6a1fe1c5bc810fe4c8

For now, for anyone who runs into this issue, subclass TesseractOCRParser locally to include the above changes, which should be pushed in the next snapshot release.

Thanks to Tim @ Apache Tika!



来源:https://stackoverflow.com/questions/57064003/tikaexception-failed-to-close-temporary-resource-how-to-fix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!