How to add new mime type to apache tika

两盒软妹~` 提交于 2019-12-10 15:42:13

问题


This is my class for reading mime types. I am trying to add a new mime type(properties file) and read it.

This is my class file:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package check_mime;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;


public class TikaFileTypeDetector {

    private final Tika tika = new Tika();

    public TikaFileTypeDetector() {
        super();
    }

    public String probeContentType(Path path) throws IOException {

        // Check contents first
        String fileContentDetect = tika.detect(path.toFile());
        if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileContentDetect;
        }

        // Try file name only if content search was not successful
        String fileNameDetect = tika.detect(path.toString());
        if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileNameDetect;
        }

        return null;
    }

    public static void main(String[] args) throws IOException {

        Tika tika = new Tika();

        if (args.length != 1) {
            printUsage();
            return;
        }
        Path path = Paths.get(args[0]);

        TikaFileTypeDetector detector = new TikaFileTypeDetector();

        String contentType = detector.probeContentType(path);

        System.out.println("File is of type - " + contentType);
    }

    public static void printUsage() {
        System.out.print("Usage: java -classpath ... "
                + TikaFileTypeDetector.class.getName()
                + " ");
    }
}

From the docs I have created a custom xml:

 <?xml version="1.0" encoding="UTF-8"?>
 <mime-info>
   <mime-type type="text/properties">
          <glob pattern="*.properties"/>
   </mime-type>
 </mime-info>

Now how do I add to my program and read it. Do I have to create a parser? I'm stuck here.


回答1:


This is covered in the Apache Tika 5 minute parser instructions. To add support for Java .properties files, you should first create a file called custom-mimetypes.xml and populate it with something like:

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
  <mime-type type="text/properties">
     <_comment>Java Properties</_comment>
     <glob pattern="*.properties"/>
     <sub-class-of type="text/plain"/>
   </mime-type>
</mime-info>

Next, you need to put that somewhere that Tika can find it, with the right name. It must be stored as org/apache/tika/mime/custom-mimetypes.xml on your classpath. The easiest thing to do is to create that directory structure, move the new file in, then add the root directory to your classpath. For deployment, you should wrap that up into a jar and put it on the classpath

You can use the Tika App to check your mime type file was loaded, if you're careful. With your code pacakged as a jar, run it as something like:

java -classpath tika-app-1.10-SNAPSHOT.jar:my-custom-mimetypes.jar org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

Alternately, if you have it in a local directory, try something like

ls -l org/apache/tika/mime/custom-mimetypes.xml
# Check a file was found, with some content in it
java -classpath tika-app-1.10-SNAPSHOT.jar:. org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

If that isn't showing your mime type, then you didn't get the path or filename correct, double check them

(Alternately, upgrade to a newer version of Apache Tika, as since r1686315 Tika has a Java Properties mimetype built in!)




回答2:


Tika will detect your custom definition via Java resource loading and automatically add it to its own definitions: For that you need to name it custom-mimetypes.xml and put it into package org.apache.tika.mime within your codebase.

If you create a jar file from your classes, you also need to include your custom-mimetypes.xml in the jar.




回答3:


MediaType mediaType = detector.detect(stream, metadata);
        System.out.println("Detected Media Type: " + mediaType.toString());
        MimeType mimeType = config.getMimeRepository().forName(mediaType.toString());
        String extension = mimeType.getExtension();


来源:https://stackoverflow.com/questions/30895761/how-to-add-new-mime-type-to-apache-tika

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!