Namely, how would you tell an archive (jar/rar/etc.) file from a textual (xml/txt, encoding-independent) one?
I used this code and it works for English and German text pretty well:
private boolean isTextFile(String filePath) throws Exception {
File f = new File(filePath);
if(!f.exists())
return false;
FileInputStream in = new FileInputStream(f);
int size = in.available();
if(size > 1000)
size = 1000;
byte[] data = new byte[size];
in.read(data);
in.close();
String s = new String(data, "ISO-8859-1");
String s2 = s.replaceAll(
"[a-zA-Z0-9ßöäü\\.\\*!\"§\\$\\%&/()=\\?@~'#:,;\\"+
"+><\\|\\[\\]\\{\\}\\^°²³\\\\ \\n\\r\\t_\\-`´âêîô"+
"ÂÊÔÎáéíóàèìòÁÉÍÓÀÈÌÒ©‰¢£¥€±¿»«¼½¾™ª]", "");
// will delete all text signs
double d = (double)(s.length() - s2.length()) / (double)(s.length());
// percentage of text signs in the text
return d > 0.95;
}