GNU diff doesn\'t seem to be smart enough to detect and handle UTF-16 files, which surprises me. Am I missing an obvious command-line option? Is there a good alternative?<
In Python, you can use difflib.HtmlDiff to create an HTML table that shows the differences between two sequences of lines, and it seems to work fine with Unicode strings (provided, of course, you read and write them with the appropriate codecs).
>>> hd = difflib.HtmlDiff()
>>> htmldiff = hd.make_file(codecs.open('file1', 'r', 'utf-16').readlines(), codecs.open('file2', 'r', 'utf-16').readlines())
>>> print >> codecs.open('diff.html', 'w', 'utf-16'), htmldiff