I\'d need to get the iana.org MediaType rather than application/zip or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.
If you look at mim
The default byte pattern detection rules in tika-core can only detect the generic OLE2 or ZIP format used by all MS Office document types. You want to use ContainerAwareDetector for this kind of detection afaik. And use MimeTypes detector as its fallback detector. Try this :
public MediaType getContentType(InputStream is, String fileName) {
MediaType mediaType;
Metadata md = new Metadata();
md.set(Metadata.RESOURCE_NAME_KEY, fileName);
Detector detector = new ContainerAwareDetector(tikaConfig.getMimeRepository());
try {
mediaType = detector.detect(is, md);
} catch (IOException ioe) {
whatever;
}
return mediaType;
}
This way your tests should pass