I would like to detect encoding of some text (using PHP). For that purpose i use mb_detect_encoding() function.
The problem is that the function returns different re
Keep in mind mb_detect_encoding() does not know what encoding the data is in. You may see a string, but the function itself only sees a stream of bytes. Going by that, it needs to guess what the encoding is - e.g. ASCII would be if bytes are only in the 0-127 range, UTF-8 would be if there are ASCII bytes and 128+ bytes that exist only in pairs or more, and so forth.
As you can imagine, given that context, it's quite difficult to detect an encoding reliably.
Like rihk said, this is what the mb_detect_order() function is for - you're basically supplying your best guess what the data is likely to be. Do you work with UTF-8 files frequently? Then chances are your stuff isn't likely to be UTF-16 even if mb_detect_encoding() could guess it as that.
You might also want to check out Artefacto's link for a more in-depth view.
Example case: Internet Explorer uses some interesting encoding guessing if nothing is specified (@link, Section: 'To automatically detect a website's language') that's caused strange behaviours on websites that took encoding for granted in the past. You can probably find some amusing stuff on that if you google around. It makes for a nice show-case how even statistical methods can backfire horribly, and why encoding-guessing in general is problematic.