How can I remove the BOM from a UTF-8 file?

后端 未结 5 1650
忘掉有多难
忘掉有多难 2020-12-06 04:53

I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?

$ file test.xml
         


        
相关标签:
5条回答
  • 2020-12-06 05:11

    IF you are certain that a given file starts with a BOM, then it is possible to remove the BOM from a file with the tail command:

    tail --bytes=+4 withBOM.txt > withoutBOM.txt
    
    0 讨论(0)
  • 2020-12-06 05:13

    A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.

    With bash, you can create a UTF-8 BOM with the $'' special quoting form, which implements Unicode escapes: $'\uFEFF'. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:

    sed -i $'1s/^\uFEFF//' file.txt
    

    This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.

    If you are using some other shell, you might find that "$(printf '\ufeff')" produces the BOM character (that works with zsh as well as any shell without a printf builtin, provided that /usr/bin/printf is the Gnu version ), but if you want a Posix-compatible version you could use:

    sed "$(printf '1s/^\357\273\277//)" file.txt
    

    (The -i in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)

    0 讨论(0)
  • 2020-12-06 05:20

    Well, just dealt with this today and my preferred way was dos2unix:

    dos2unix will remove BOM and also take care of other idiosyncrasies from other SOs:

    $ sudo apt install dos2unix
    $ dos2unix test.xml
    

    It's also possible to remove BOM only (-r, --remove-bom):

    $ dos2unix -r test.xml
    

    Note: tested with dos2unix 7.3.4

    0 讨论(0)
  • 2020-12-06 05:28

    Joshua Pinter's answer works correctly on mac so I wrote a script that removes the BOM from all files in a given folder, see here.

    It can be used like follows:

    Remove BOM from all files in current directory: rmbom .

    Print all files with a BOM in the current directory: rmbom . -a

    Only remove BOM from all files in current directory with extension txt or cs: rmbom . -e txt -e cs

    0 讨论(0)
  • 2020-12-06 05:33

    Using VIM

    1. Open file in VIM:

       vi text.xml
      
    2. Remove BOM encoding:

       :set nobomb
      
    3. Save and quit:

       :wq
      

    For a non-interactive solution, try the following command line:

    vi -c ":set nobomb" -c ":wq" text.xml
    

    That should remove the BOM, save the file and quit, all from the command line.

    0 讨论(0)
提交回复
热议问题