Given an array of bytes representing text in some unknown encoding (usually UTF-8 or ISO-8859-1, but not necessarily so), what is the best way to obtain a guess for the most
There is also Apache Tika - a content analysis toolkit. It can guess the mime type, and it can guess the encoding. Usually the guess is correct with a very high probability.