Does awk CR LF handling break on cygwin?

前端 未结 2 2108

On Linux, this runs as expected:

$ echo -e \"line1\\r\\nline2\"|awk -v RS=\"\\r\\n\" \'/^line/ {print \"awk: \"$0}\'
awk: line1
awk: line2

2条回答
  •  萌比男神i
    2020-12-17 20:20

    It seems like the issue is awk specific under Cygwin.
    I tried a few different things and it seems that awk is silently treating replacing \r\n with \n in the input data.

    If we simply ask awk to repeat the text unmodified, it will "sanitize" the carriage returns without asking:

    $ echo -e "line1\r\nline2" | od -a
    0000000   l   i   n   e   1  cr  nl   l   i   n   e   2  nl
    0000015
    
    $ echo -e "line1\r\nline2" | awk '{ print $0; }' | od -a
    0000000   l   i   n   e   1  nl   l   i   n   e   2  nl
    0000014
    

    It will, however, leave other carriage returns intact:

    $ echo -e "Test\rTesting\r\nTester\rTested" | awk '{ print $0; }' | od -a
    0000000   T   e   s   t  cr   T   e   s   t   i   n   g  nl   T   e   s
    0000020   t   e   r  cr   T   e   s   t   e   d  nl
    0000033
    

    Using a custom record separator of _ ended up leaving the carriage returns intact:

    $ echo -e "Testing\r_Tested" | awk -v RS="_" '{ print $0; }' | od -a
    0000000   T   e   s   t   i   n   g  cr  nl   T   e   s   t   e   d  nl
    0000020  nl
    0000021
    

    The most telling example involves having \r\n in the data, but not as a record separator:

    $ echo -e "Testing\r\nTested_Hello_World" | awk -v RS="_" '{ print $0; }' | od -a
    0000000   T   e   s   t   i   n   g  nl   T   e   s   t   e   d  nl   H
    0000020   e   l   l   o  nl   W   o   r   l   d  nl  nl
    0000034
    

    awk is blindly converting \r\n to \n in the input data even though we didn't ask it to.

    This substitution seems to be happening before applying record separation, which explains why RS="\r\n" never matches anything. By the time awk is looking for \r\n, it's already substituted it with \n in the input data.

提交回复
热议问题