Given an array of bytes representing text in some unknown encoding (usually UTF-8 or ISO-8859-1, but not necessarily so), what is the best way to obtain a guess for the most
Here's my favorite: https://github.com/codehaus/guessencoding
It works like this:
It may sound overly simplistic, but in my day-to-day work it's well over 90% accurate.