I\'d need to get the iana.org MediaType rather than application/zip or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.
If you look at mim
For anyone else having a similar problem but using newer Tika version this should do the trick:
ZipContainerDetector since you may have no ContainerAwareDetector any more.TikaInputStream to the detect() method of the detector to ensure tika can analyze the correct mime type.My example code looks like this:
public static String getMimeType(final Document p_document)
{
try
{
Metadata metadata = new Metadata();
metadata.add(Metadata.RESOURCE_NAME_KEY, p_document.getDocName());
Detector detector = getDefaultDectector();
LogMF.debug(log, "Trying to detect mime type with detector {0}.", detector);
TikaInputStream inputStream = TikaInputStream.get(p_document.getData(), metadata);
return detector.detect(inputStream, metadata).toString();
}
catch (Throwable t)
{
log.error("Error while determining mime-type of " + p_document);
}
return null;
}
private static Detector getDefaultDectector()
{
if (detector == null)
{
List detectors = new ArrayList<>();
// zip compressed container types
detectors.add(new ZipContainerDetector());
// Microsoft stuff
detectors.add(new POIFSContainerDetector());
// mime magic detection as fallback
detectors.add(MimeTypes.getDefaultMimeTypes());
detector = new CompositeDetector(detectors);
}
return detector;
}
Note that the Document class is part of my domain model. So you will for sure have something similar at that line.
I hope that someone can use this.