how to check end-of-line of a text file to see if it is unix or dos format?

强颜欢笑 提交于 2019-12-04 08:26:44
if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
  echo "is DOS"
fi
tue

Simply use the file command. If the file contains lines with CR LF at the end, this is printed out by a comment: 'ASCII text, with CRLF line terminators'.

e.g.

if file  myFile | grep "CRLF"  > /dev/null 2>&1;
  then
  ....
fi

The latest (7.1) version of the dos2unix (and unix2dos) command that installs with Cygwin and some recent Linux distributions has a handy --info option which prints out a count of the different types of newline in each file. This is dos2unix 7.1 (2014-10-06) http://waterlan.home.xs4all.nl/dos2unix.html

From the man page:

--info[=FLAGS] FILE ...
       Display file information. No conversion is done.

The following information is printed, in this order: 
number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name.

       Example output:
            6       0       0  no_bom    text    dos.txt
            0       6       0  no_bom    text    unix.txt
            0       0       6  no_bom    text    mac.txt
            6       6       6  no_bom    text    mixed.txt
           50       0       0  UTF-16LE  text    utf16le.txt
            0      50       0  no_bom    text    utf8unix.txt
           50       0       0  UTF-8     text    utf8dos.txt
            2     418     219  no_bom    binary  dos2unix.exe

Optionally extra flags can be set to change the output. One or more flags can be added.
       d   Print number of DOS line breaks.
       u   Print number of Unix line breaks.
       m   Print number of Mac line breaks.
       b   Print the byte order mark.
       t   Print if file is text or binary.
       c   Print only the files that would be converted.

With the "c" flag dos2unix will print only the files that contain DOS line breaks, unix2dos will print only file names that have Unix line breaks.

Thus:

if [[ -n $(dos2unix --info=c "${filename}") ]] ; then echo DOS; fi

Conversely:

if [[ -n $(unix2dos --info=c "${filename}") ]] ; then echo UNIX; fi

I can't test on AIX, but try:

if [[ "$(head -1 filename)" == *$'\r' ]]; then echo DOS; fi

You can simply remove any existing carriage returns from all lines, and then add the carriage return to the end of all lines. Then it doesn't matter what format the incoming file is in. The outgoing format will always be DOS format.

sed 's/\r$//;s/$/\r/'

I'm probably late on this one, but I've had the same issue and I did not want to put the special ^M character in my script (I'm worried some editors might not display the special character properly or some later programmer might replace it by 2 normal characters: ^ and M...).

The solution I found feeds the special character to grep, by letting the shell convert its hex value:

if head -1 ${filename} | grep $'[\x0D]' >/dev/null
then
  echo "Win"
else
  echo "Unix"
fi

unfortunately I cannot make the $'[\x0D]' construct work in ksh. In ksh, I found this: if head -1 ${filename} | od -x | grep '0d0a$' >/dev/null then echo "Win" else echo "Unix" fi

od -x displays the text in hex codes. '0d0a$' is the hex code for CR-LF (the DOS-Win line terminator). The Unix line terminator is '0a00$'

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!