Tika 1.13 RuntimeException

孤人 提交于 2019-12-24 08:08:38

问题


I recently updated my existing tika project to use tika 1.13 instead of 1.10. The only thing I did was changing the dependency version from 1.10 to 1.13. The project was built successfully. Yet whenever I try and run the application I get this exception:

java.lang.RuntimeException: Unable to parse the default media type registry
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
    at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
    at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
    at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
    at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
    at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
    at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
    at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
    ... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
    ... 14 more

The exception is thrown from the constructor of my MetaParser class, the only thing there is the initialization of the AutoDetectParser:

private final AutoDetectParser _tikaExtractor;
public MetaParser()
    {
        _tikaExtractor = new AutoDetectParser();
    }

I am running the application on Ubuntu 14.04 with Oracle JDK 1.8.0_91-b14.

I looked online and this exception was mentioned a couple of times, once a probable fix was to install OpenJDK but that was for an old version of Tika and since the old version used to work fine with the same JDK I don't think that is the problem.

Is there something I need to do or initialize before calling the AutoDetectParser constructor?


回答1:


Promoting comments to an answer - you have a very old version of Xerces on your classpath. Your JVM is picking that as the default XML Parser, so when Tika says "Hi JVM, can I have a safe XML Parser" it fails.

(Tika made improvements in the 1.10 to 1.13 period to how XML Parsing is done, including setting safer defaults, which is why this has started happening)

You either need to remove your old Xerces jars, so that the JVM-supplied XML Parser starts being used, or replace them with a more recent Xerces version

You may also find some of the advice in Error unmarshalling XML in Java 8 “secure-processing org.xml.sax.SAXNotRecognizedException” helpful, especially if you're struggling to locate the pesky old Xerces jar in your build!



来源:https://stackoverflow.com/questions/37941870/tika-1-13-runtimeexception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!