Using an array in AWK when working with two files

喜夏-厌秋 提交于 2021-02-10 06:14:19

问题


I have two files I merged them based key using below code

file1
-------------------------------
1      a      t      p      bbb  
2      b      c      f      aaa  
3      d      y      u      bbb  
2      b      c      f      aaa  
2      u      g      t      ccc  
2      b      j      h      ccc

file2
--------------------------------
1   11   bbb  
2   22   ccc  
3   33   aaa  
4   44   aaa  

I merged these two file based key using below code

awk 'NR==FNR{a[$3]=$0;next;}{for(x in a){if(x==$5) print $1,$2,$3,$4,a[x]};  

My question is how I can save $2 of file2 in variable or array and print after a[x] again.
My desired result is :

1 a t p 1   11  bbb  11  
2 b c f 3   33  aaa  33  
2 b c f 4   44  aaa  44  
3 d y u 1   11  bbb  11  
2 b c f 3   33  aaa  33  
2 b c f 4   44  aaa  44  
2 u g t 2   22  ccc  22  
2 b j h 2   22  ccc  22  

As you see the first 7 columns is the result of my merge code. I need add the last column (field 2 of a[x]) to my result.

Important:

My next question is if I have .awk file, how I can use some bash script code like (| column -t) or send result to file (awk... > result.txt)? I always use these codes in command prompt. Can I use them inside my code in .awk file?


回答1:


Simply add all of file2 to an array, and use split to hold the bits you want:

awk 'FNR==NR { two[$0]++; next } { for (i in two) { split(i, one); if (one[3] == $NF) print $1,$2,$3,$4, i, one[2] } }' file2 file1

Results:

1 a t p 1   11   bbb   11
2 b c f 3   33   aaa   33
2 b c f 4   44   aaa   44
3 d y u 1   11   bbb   11
2 b c f 3   33   aaa   33
2 b c f 4   44   aaa   44
2 u g t 2   22   ccc   22
2 b j h 2   22   ccc   22

Regarding your last question; you can also add 'pipes' and 'writes' inside of your awk. Here's an example of a pipe to column -t:

Contents of script.awk:

FNR==NR { 
    two[$0]++
    next
}

{
    for (i in two) {
        split(i, one)
        if (one[3] == $NF) { 
            print $1,$2,$3,$4, i, one[2] | "column -t"
        }
    }
}

Run like: awk -f script.awk file2 file1

EDIT:

Add the following to your shell script:

results=$(awk '

    FNR==NR {
        two[$0]++
        next
    }

    {
        for (i in two) {
            split(i, one)
            if (one[3] == $NF) {
                print $1,$2,$3,$4, i, one[2] | "column -t"
            }
        }
    }
' $1 $2)

echo "$results"

Run like:

./script.sh file2.txt file1.txt

Results:

1  a  t  p  1  11  bbb  11
2  b  c  f  3  33  aaa  33
2  b  c  f  4  44  aaa  44
3  d  y  u  1  11  bbb  11
2  b  c  f  3  33  aaa  33
2  b  c  f  4  44  aaa  44
2  u  g  t  2  22  ccc  22
2  b  j  h  2  22  ccc  22



回答2:


Your current script is:

awk 'NR==FNR { a[$3]=$0; next }
             { for (x in a) { if (x==$5) print $1,$2,$3,$4,a[x] } }'

(Actually, the original is missing the second close brace for the second pattern/action pair.)

It seems that you process file2 before you process file1.

You shouldn't need the loop in the second code. And you can make life easier for yourself by using the splitting in the first phase to keep the values you need:

awk 'NR==FNR { c1[$3] = $1; c2[$3] = $2; next }
             { print $1, $2, $3, $4, c1[$5], c2[$5], $5, c2[$5] }'

You can upgrade that to check whether c1[$5] and c2[$5] are defined, presumably skipping the row if they are not.

Given your input files, the output is:

1 a t p 1 11 bbb 11
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22

Give or take column spacing, that's what was requested. Column spacing can be fixed by using printf instead of print, or setting OFS to tab, or ...

The c1 and c2 notations for column 1 and 2 is OK for two columns. If you need more, then you should probably use the 2D array notation:

awk 'NR==FNR { for (i = 1; i <= NF; i++) col[i,$3] = $i; next }
             { print $1, $2, $3, $4, col[1,$5], col[2,$5], $5, col[2,$5] }'

This produces the same output as before.




回答3:


To achieve what you ask, save the second field after the whole line in the processing of your first file, with a[$3]=$0 OFS $2. For your second question, awk has a variable to separate fields in output, it's OFS, assign a tabulator to it and play with it. Your script would be like:

awk '
    BEGIN { OFS = "\t"; } 
    NR==FNR{
        a[$3]=$0 OFS $2;
        next;
    }
    {
        for(x in a){
            if(x==$5) print $1,$2,$3,$4,a[x]
        } 
    }
' file2 file1

That yields:

1       a       t       p       1   11   bbb    11
2       b       c       f       4   44   aaa    44
3       d       y       u       1   11   bbb    11
2       b       c       f       4   44   aaa    44
2       u       g       t       2   22   ccc    22                                                                                                                                                                                           
2       b       j       h       2   22   ccc    22


来源:https://stackoverflow.com/questions/12904836/using-an-array-in-awk-when-working-with-two-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!