I\'m trying to enter some UTF-8 characters into a LaTeX file in TextMate (which says its default encoding is UTF-8), but LaTeX doesn\'t seem to understand them.
Runn
Synalyze It! allows to compare text or bytes in all encodings the ICU library offers. Using that feature you usually see immediately which code page makes sense for your data.
Which LaTeX are you using? When I was using teTeX, I had to manually download the unicode package and add this to my .tex files:
% UTF-8 stuff
\usepackage[notipa]{ucs}
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
Now, I've switched over to XeTeX from the TeXlive 2008 package (here), it is even more simple:
% UTF-8 stuff
\usepackage{fontspec}
\usepackage{xunicode}
As for detection of a file's encoding, you could play with file(1)
(but it is rather limited) but like someone else said, it is difficult.
A brute-force way to check the encoding might just be to check the file in a hex editor or similar. (or write a program to check) Look at the binary data in the file. The UTF-8 format is fairly easy to recognize. All ASCII characters are single bytes with values below 128 (0x80) Multibyte sequences follow the pattern shown in the wiki article
If you can find a simpler way to get a program to verify the encoding for you, that's obviously a shortcut, but if all else fails, this would do the trick.