How to use awk variables in regular expressions?

前端 未结 5 1468
渐次进展
渐次进展 2020-12-03 09:52

I have a file called domain which contains some domains. For example:

google.com
facebook.com
...
yahoo.com

And I have ano

5条回答
  •  天命终不由人
    2020-12-03 10:28

    You clearly want to read the site file once, not once per entry in domain. Fixing that, though, is trivial.

    Equally, variables in awk (other than fields $0 .. $9, etc) are not prefixed with $. In particular, $dom is the field number identified by the variable dom (typically, that's going to be 0 since domain strings don't convert to any other number).

    I think you need to find a way to get the domain from the data read from the site file. I'm not sure if you need to deal with sites with country domains such as bbc.co.uk as well as sites in the GTLDs (google.com etc). Assuming you are not dealing with country domains, you can use this:

    BEGIN {
        while (getline dom < "./domain" > 0) domain[dom] = 0
        FS = "[ .]+"
        while (getline  < "./site" > 0)
        {
            topdom = $(NF-2) "." $(NF-1)
            domain[topdom] += $NF          
        }
        for (dom in domain) print dom "  " domain[dom]
    }
    

    In the second while loop, there are NF fields; $NF contains the count, and $1 .. $(NF-1) contain components of the domain. So, topdom ends up containing the top domain name, which is then used to index into the array initialized in the first loop.

    Given the data in the question (minus the lines of dots), the output is:

    yahoo.com  0
    facebook.com  37
    google.com  18
    

提交回复
热议问题