Getting MimeType subtype with Apache tika

前端 未结 4 977
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-29 10:35

I\'d need to get the iana.org MediaType rather than application/zip or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.

If you look at mim

4条回答
  •  感动是毒
    2020-12-29 11:24

    The default byte pattern detection rules in tika-core can only detect the generic OLE2 or ZIP format used by all MS Office document types. You want to use ContainerAwareDetector for this kind of detection afaik. And use MimeTypes detector as its fallback detector. Try this :

    public MediaType getContentType(InputStream is, String fileName) {
        MediaType mediaType;
        Metadata md = new Metadata();
        md.set(Metadata.RESOURCE_NAME_KEY, fileName);
        Detector detector = new ContainerAwareDetector(tikaConfig.getMimeRepository());
    
        try {
            mediaType = detector.detect(is, md);
        } catch (IOException ioe) {
            whatever;
        }
        return mediaType;
    }
    

    This way your tests should pass

提交回复
热议问题