Getting MimeType subtype with Apache tika

前端 未结 4 967
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-29 10:35

I\'d need to get the iana.org MediaType rather than application/zip or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.

If you look at mim

4条回答
  •  爱一瞬间的悲伤
    2020-12-29 11:16

    For anyone else having a similar problem but using newer Tika version this should do the trick:

    1. Use ZipContainerDetector since you may have no ContainerAwareDetector any more.
    2. Give a TikaInputStream to the detect() method of the detector to ensure tika can analyze the correct mime type.

    My example code looks like this:

    public static String getMimeType(final Document p_document)
    {
        try
        {
            Metadata metadata = new Metadata();
            metadata.add(Metadata.RESOURCE_NAME_KEY, p_document.getDocName());
    
            Detector detector = getDefaultDectector();
    
            LogMF.debug(log, "Trying to detect mime type with detector {0}.", detector);
            TikaInputStream inputStream = TikaInputStream.get(p_document.getData(), metadata);
    
            return detector.detect(inputStream, metadata).toString();
        }
        catch (Throwable t)
        {
            log.error("Error while determining mime-type of " + p_document);
        }
    
        return null;
    }
    
    private static Detector getDefaultDectector()
    {
        if (detector == null)
        {
            List detectors = new ArrayList<>();
    
            // zip compressed container types
            detectors.add(new ZipContainerDetector());
            // Microsoft stuff
            detectors.add(new POIFSContainerDetector());
            // mime magic detection as fallback
            detectors.add(MimeTypes.getDefaultMimeTypes());
    
            detector = new CompositeDetector(detectors);
        }
    
        return detector;
    }
    

    Note that the Document class is part of my domain model. So you will for sure have something similar at that line.

    I hope that someone can use this.

提交回复
热议问题