问题
I recently updated my existing tika project to use tika 1.13 instead of 1.10. The only thing I did was changing the dependency version from 1.10 to 1.13. The project was built successfully. Yet whenever I try and run the application I get this exception:
java.lang.RuntimeException: Unable to parse the default media type registry
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
... 14 more
The exception is thrown from the constructor of my MetaParser
class, the only thing there is the initialization of the AutoDetectParser
:
private final AutoDetectParser _tikaExtractor;
public MetaParser()
{
_tikaExtractor = new AutoDetectParser();
}
I am running the application on Ubuntu 14.04 with Oracle JDK 1.8.0_91-b14.
I looked online and this exception was mentioned a couple of times, once a probable fix was to install OpenJDK but that was for an old version of Tika and since the old version used to work fine with the same JDK I don't think that is the problem.
Is there something I need to do or initialize before calling the AutoDetectParser
constructor?
回答1:
Promoting comments to an answer - you have a very old version of Xerces on your classpath. Your JVM is picking that as the default XML Parser, so when Tika says "Hi JVM, can I have a safe XML Parser" it fails.
(Tika made improvements in the 1.10 to 1.13 period to how XML Parsing is done, including setting safer defaults, which is why this has started happening)
You either need to remove your old Xerces jars, so that the JVM-supplied XML Parser starts being used, or replace them with a more recent Xerces version
You may also find some of the advice in Error unmarshalling XML in Java 8 “secure-processing org.xml.sax.SAXNotRecognizedException” helpful, especially if you're struggling to locate the pesky old Xerces jar in your build!
来源:https://stackoverflow.com/questions/37941870/tika-1-13-runtimeexception