问题
I've been looking at a lot of posts and haven't quite found what I'm looking for. I'm not sure how to go about taking the following sample data:
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 40000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 40000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 40000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 40000 10
host1 output nic1 ip2 ip3 PROT 50000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
And merge it into:
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
I have a large amount of data like this with the need to make ranges for multiple fields of a given line but I think if somebody can show me how to do it for one field as I have above, I should be able to figure the rest out. And if not I'll follow up :). Thanks in advance for any help.
回答1:
Update
I have refactored the code in the answer below so as to make it more readable. The main body should read almost English prose.
#!/usr/bin/awk -f
# main body
NR == 1 {
copyRecordTo(veryold)
next
}
{
if (inSameGroup()) {
copyRecordTo(old)
} else {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
copyRecordTo(veryold)
}
}
END {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
}
# functions
function copyRecordTo(line) {
for (i = 1; i <= NF; ++i) line[i] = $i
}
function nicePrint() {
for (i = 1; i <= NF; ++i) {
i == NF - 1 ? fmt = "%s\t\t" : fmt = "%s\t"
printf(fmt, old[i])
}
printf("\n")
}
function makeRangeForField(f) {
if (old[f] != veryold[f])
old[f] = veryold[f]":"old[f]
}
function inSameGroup() {
b = 1
for (i = 1; i <= NF - 2; ++i)
b *= $i == veryold[i]
return b == 1
}
Original answer
The following awk script generates almost what you are looking for.
Essentially the script does the following:
- stores in
veryoldthe first line of each set of lines that differ only for the 7th and/or 8th filed - stores in
oldthe last read line - the "boolean"
bis used to check when that last line is surpassed - when this happens the last two fields of
veryoldare joined with those ofoldwith a:in between if they are different, andoldis printed - one more tab
\tis used between the last two fields to improve readability
Other two points:
NR == 1is a special case that has to initializeveryoldonly- after the last line is read
ENDhandles the special case of the last line stored inold
#!/usr/bin/awk -f
NR == 1 {
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
next
}
{
b = 1
for (i = 2; i <= NF - 2; ++i) {
b *= $i == veryold[i]
}
if (b == 1) {
for (i = 1; i <= NF; ++i) {
old[i] = $i
}
} else {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
printf("\n")
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
}
}
END {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
}
来源:https://stackoverflow.com/questions/60895489/merge-all-lines-that-are-identical-aside-from-a-key-field-and-make-key-field-a-r