How do I determine file encoding in OS X?

后端未结

关注

 15  972

I\'m trying to enter some UTF-8 characters into a LaTeX file in TextMate (which says its default encoding is UTF-8), but LaTeX doesn\'t seem to understand them.

Runn

相关标签:

15条回答

遇见更好的自我

2020-11-29 15:27

In Mac OS X the command file -I (capital i) will give you the proper character set so long as the file you are testing contains characters outside of the basic ASCII range.

For instance if you go into Terminal and use vi to create a file eg. vi test.txt then insert some characters and include an accented character (try ALT-e followed by e) then save the file.

They type file -I text.txt and you should get a result like this:

test.txt: text/plain; charset=utf-8

0 讨论(0)
发布评论:

提交评论
- 加载中...

闹比i

2020-11-29 15:31

I implemented the bash script below, it works for me.

It first tries to iconv from the encoding returned by file --mime-encoding to utf-8.

If that fails, it goes through all encodings and shows the diff between the original and re-encoded file. It skips over encodings that produce a large diff output ("large" as defined by the MAX_DIFF_LINES variable or the second input argument), since those are most likely the wrong encoding.

If "bad things" happen as a result of using this script, don't blame me. There's a rm -f in there, so there be monsters. I tried to prevent adverse effects by using it on files with a random suffix, but I'm not making any promises.

Tested on Darwin 15.6.0.

#!/bin/bash

if [[ $# -lt 1 ]]
then
  echo "ERROR: need one input argument: file of which the enconding is to be detected."
  exit 3
fi

if [ ! -e "$1" ]
then
  echo "ERROR: cannot find file '$1'"
  exit 3
fi

if [[ $# -ge 2 ]]
then
  MAX_DIFF_LINES=$2
else
  MAX_DIFF_LINES=10
fi


#try the easy way
ENCOD=$(file --mime-encoding $1 | awk '{print $2}')
#check if this enconding is valid
iconv -f $ENCOD -t utf-8 $1 &> /dev/null
if [ $? -eq 0 ]
then
  echo $ENCOD
  exit 0
fi

#hard way, need the user to visually check the difference between the original and re-encoded files
for i in $(iconv -l | awk '{print $1}')
do
  SINK=$1.$i.$RANDOM
  iconv -f $i -t utf-8 $1 2> /dev/null > $SINK
  if [ $? -eq 0 ]
  then
    DIFF=$(diff $1 $SINK)
    if [ ! -z "$DIFF" ] && [ $(echo "$DIFF" | wc -l) -le $MAX_DIFF_LINES ]
    then
      echo "===== $i ====="
      echo "$DIFF"
      echo "Does that make sense [N/y]"
      read $ANSWER
      if [ "$ANSWER" == "y" ] || [ "$ANSWER" == "Y" ]
      then
        echo $i
        exit 0
      fi
    fi
  fi
  #clean up re-encoded file
  rm -f $SINK
done

echo "None of the encondings worked. You're stuck."
exit 3

0 讨论(0)

日久生厌

2020-11-29 15:32
You can also convert from one file type to another using the following command :
```
iconv -f original_charset -t new_charset originalfile > newfile
```
e.g.
```
iconv -f utf-16le -t utf-8 file1.txt > file2.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-29 15:32

The @ sign means the file has extended attributes. xattr file shows what attributes it has, xattr -l file shows the attribute values too (which can be large sometimes — try e.g. xattr /System/Library/Fonts/HelveLTMM to see an old-style font that exists in the resource fork).

0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-11-29 15:34
Using the -I (that's a capital i) option on the file command seems to show the file encoding.
```
file -I {filename}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-11-29 15:34

Typing file myfile.tex in a terminal can sometimes tell you the encoding and type of file using a series of algorithms and magic numbers. It's fairly useful but don't rely on it providing concrete or reliable information.

A Localizable.strings file (found in localised Mac OS X applications) is typically reported to be a UTF-16 C source file.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页