Mimetype check using Tika jars

时光总嘲笑我的痴心妄想 提交于 2019-12-01 08:09:41

Firstly, you're using the wrong bit of Apache Tika. If all you want to know is the file type, then you should use the Detection API (javadocs) directly, eg:

TikaConfig tika = new TikaConfig();

Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
String mimetype = tika.getDetector().detect(stream, metadata);

If you have only the tika-core jar on your classpath, then the detection above will use Mime Magic and Filename hints. That'll let it get most files, especially if they have the right extension, but it'll struggle only wrongly named "container formats"

Container Formats are things like zip, ole2 etc, where one file format can hold many types (eg ods, xlsx, keynote all use .zip, .doc and .xls both use ole2). If you want to do detection that looks inside containers for more accurate results, you need to also include the tika-parser jar and its dependencies.

Note that, as explained in the Javadocs, your stream needs to support mark and reset for detection to work. This is so that Tika can read the first bit of your stream, look at it to work out what your file is, then return the stream to how it was ready for other uses (eg parsing). Most streams should, but if yours doesn't, the simplest way to fix it is to wrap it in a TikaInputStream via TikaInputStream.get, which sorts all that out for you

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!