问题
Here are the contents of the file:
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
And these two things give the same result:
$ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents
$ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents
The result given in both cases is:
Person Name (222) 466-1234
Another person (523) 643-8754
RS="\n\n" actually makes sense, but why is RS="" also treated the same way?
回答1:
They aren't treated the same.
RS=""invokes paragraph mode in all awks and so the input is split into records separated by contiguous sequences of empty lines and a newline is added to the FS if the existing FS is a single character (note: the POSIX standard is incorrect in this area as it implies\nwould get added to anyFSbut that's not the case, see https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html).RS="\n\n"works in GNU awk to set the record separator to a single blank line and does not affect FS. In all other awks the 2nd\nwill be ignored (more than 1 char in a RS is undefined behavior per POSIX so they COULD do anything but that's by far the most common implementation).
Look what happens when you have 3 blank lines between your 2 blocks of text and use a FS other than \n (e.g. ,):
$ cat file
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
.
$ gawk 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name
123 High Street
(222) 466-1234>
2 0 <>
3 1 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name>
2 1 <123 High Street>
3 1 <(222) 466-1234>
4 0 <>
5 0 <>
6 0 <>
7 1 <Another person>
8 1 <487 High Street>
9 1 <(523) 643-8754>
10 0 <>
Note the different values for NR and NF and different $0 contents being printed.
回答2:
Because POSIX awk specification says so.
If
RSis null, then records are separated by sequences consisting of a<newline>plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a<newline>shall always be a field separator, no matter what the value ofFSis.
来源:https://stackoverflow.com/questions/57851531/in-awk-why-are-and-n-n-treated-the-same-for-the-rs-parameter