How to properly configure Apache Tika for a few document types?
问题 I've been using Tika for a while and I know that one is supposed to use only Tika facade with either default or custom TikaConfig that represents org/apache/tika/mime/tika-mimetypes.xml file. My application doesn't allow any document type different than html,doc,docx,odt,txt,rtf,srt,sub,pdf,odf,odp,xls,ppt,msg and the default MediaTypes includes tons of others. Are we supposed to modify tika-mimetypes.xml so that we remove MimeTypes that we don't need ? Then as I understand it will create