Comparing two text files printing result in new header [closed]

可紊 提交于 2019-12-13 08:24:54

问题


Okay i will reupdate this

I have 2 files - File1.txt , File2.txt

File1 is base template

File2 is having status result

file1.txt

N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH

file2.txt

DF,GH,MH,FR,FG,GH,NA
XX,ZZ,XC,EE,RR,BB,OK

Below command compares column 1 in both files if it matches then it retrieves the value from 7th cell in file2 and appends in file1.txt as last column with new header.

if not found NA is updated .

Command used :

awk -F  '
  FNR==NR { a[$1]=$7; next }
  FNR==1  { print $0; len=length($0); next }
  {
    printf $0
    cont=(($1 in a) ? ","a[$1] : ",NA")
    for ( i=length($0)+1; i<=len-length(cont); i++)
      printf " " 
    print cont
  }
'  file2.txt file1.txt > tmp &&

Day1 - After running above command

N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA

Day 2 - After running above command

N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA

At Day3 i inserted a new row in File1 at bottom

N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
DM,LC,VF,GR,GH,ES

now when i run above command on Day3 , i need output like this

N1,N2,N3,N4,N5,N6,D1,D2,D3
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES,,,NA

回答1:


This awk script seems to do the job:

awk -F, '
BEGIN   { OFS = FS }
FNR==NR { a[$1] = $7; next }
FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
        { $n1 = ($1 in a) ? a[$1] : "NA"; print }
' file2.txt file1.txt

OFS is the output field separator; FS is the (input) field separator. Both are set to ,, FS by the -F option and OFS by the assignment. This makes it easy to get the correct number of fields in the output. awk's string concatenation with no operator, exemplified by "D" (n-6) is slightly weird; you get used to it, up to a point, but it still looks a little odd.

Example

The example run uses a program ow that has the synopsis:

ow file cmd …args…

It preserves the contents of the file by having the cmd …args… write to a temporary file, and if the command succeeds (exit status 0) and the output is not empty, it then preserves a copy of the original, ignores a number of signals, and then copies the temporary output over the original and cleans up. It is rather useful — code at the bottom. This is how I did my test. Clearly, I could use tmp=$(mktemp tmp.XXXXXX); awk … file1.txt > $tmp; mv $tmp file1.txt instead, or something along those lines. However, since I have ow, I use it.

$ cat file1.txt
N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
$ echo DM,LC,VF,GR,GH,ES >> file1.txt
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1,D2,D3
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES,,,NA
$

Note that as you assign to $i and i is larger than NF was, NF increases, and any missing fields are created as empty fields.

The first working version of this script had more complex logic, with a loop creating the empty fields, but since awk will do that automatically, the script simplified considerably. You'll often find that with a bit of time and care, initial solutions can be simplified and cleaned up.

However, it is probably also relevant to point out that this code is very trusting. It doesn't ensure that there are exactly 7 fields in file2.txt. It doesn't check that each line in file1.txt has either the same number of fields as the first line in the file or exactly 6 fields. If you supply screwy data in, you get screwy data out — the age-old GIGO principle: Garbage In, Garbage Out.

ow

:   "@(#)$Id: ow.sh,v 1.6 2005/06/30 18:14:08 jleffler Exp $"
#
#   Overwrite file
#   From: The UNIX Programming Environment by Kernighan and Pike
#   Amended: remove PATH setting; handle file names with blanks.

case $# in
0|1)    echo "Usage: $0 file command [arguments]" 1>&2
    exit 1;;
esac

file="$1"
shift
new=${TMPDIR:-/tmp}/ovrwr.$$.1
old=${TMPDIR:-/tmp}/ovrwr.$$.2

trap "rm -f '$new' '$old' ; exit 1" 0 1 2 15

if "$@" >"$new"
then
    cp "$file" "$old"
    trap "" 1 2 15
    cp "$new" "$file"
    rm -f "$new" "$old"
    trap 0
    exit 0
else
    echo "$0: $1 failed - $file unchanged" 1>&2
    rm -f "$new" "$old"
    trap 0
    exit 1
fi

Adding date instead of Dn to heading

Is it possible that awk can print a date in the header instead of D1?

If you want the current date added, you have two main options. One, using GNU gawk (often also installed as awk), then the time functions make it easy. Failing that, awk -v date=$(date +'%Y-%m-%d') -F, … has the system command date format a value and pass it into the awk script as variable date, which you can then print where you want it. If you want arbitrary dates passed in, then the second mechanism is the one to use.

awk -F, -v date=$(date +'%Y-%m-%d') '
BEGIN   { OFS = FS }
FNR==NR { a[$1] = $7; next }
FNR==1  { n1 = n = NF + 1; $n = date; print; next }
        { $n1 = ($1 in a) ? a[$1] : "NA"; print }
' file2.txt file1.txt

That forces today's date into the command. You can also do things prospectively or retrospectively, such as:

tmp=$(mktemp coladd.XXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

for dd in $(seq 1 31)
do
    awk -F, -v date="2014-12-$dd" '
    BEGIN   { OFS = FS }
    FNR==NR { a[$1] = $7; next }
    FNR==1  { n1 = n = NF + 1; $n = date; print; next }
            { $n1 = ($1 in a) ? a[$1] : "NA"; print }
    ' file2.txt file1.txt > $tmp
    mv $tmp file1.txt
done

Given this extra flexibility, I'd recommend using the externally-defined date over GNU's internal date manipulating functions, but YMMV.



来源:https://stackoverflow.com/questions/27572019/comparing-two-text-files-printing-result-in-new-header

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!