I am developing standard alone Java batch process. I am trying to determine file attachment mimetype using Tika Jars. I am using Tika 1.4 Jar files.
My code look like
Parser parser= new AutoDetectParser(); InputStream stream = new FileInputStream(fileAttachment); int writerHandler =-1; ContentHandler contentHandler= new BodyContentHandler(writerHandler); Metadata metadata= new Metadata(); parser.parse(stream, contentHandler, metadata, new ParseContext()); String mimeType = metadata.get(Metadata.CONTENT_TYPE); logger.debug("File Attachment: "+fileattachment.getName()+" MimeType is: "+mimeType);
This code is not working properly for the office 03 and 07 documents.
While running from eclipse I am getting correct mimetypes.
I build jar file and running from command its giving wrong mimetypes.
out put from command ------------ File Attachment: Testpdf.pdf MimeType is: application/pdf File Attachment: Testpdf.tif MimeType is: image/tiff File Attachment: Testpdf.xlsx MimeType is: application/x-tika-ooxml File Attachment: Testpdf.xltx MimeType is: application/x-tika-ooxml File Attachment: Testpdf.pptx MimeType is: application/x-tika-ooxml File Attachment: Testpdf.docx MimeType is: application/x-tika-ooxml File Attachment: Testpdf.xls MimeType is: application/zip File Attachment: Testpdf.doc MimeType is: application/x-tika-msoffice File Attachment: Testpdf.dot MimeType is: application/x-tika-msoffice File Attachment: Testpdf.ppt MimeType is: application/x-tika-msoffice File Attachment: Testpdf.xlt MimeType is: application/vnd.ms-excel
I tried with OfficePraser, OOXMLParser. Its not working. I have tried with tika 0.9 jar files. mimeTypes are coming correctly but if any one of my file attachment is "editable pdf" my batch process is dying (like "exit(0);" in code). If I have new tika jars its giving wrong mimeTypes.
Please help me in this. Thanks in advance.
CVSR Sarma