How do you parse a web page and extract all the href links?

后端 未结 7 2038
情话喂你
情话喂你 2021-01-01 19:18

I want to parse a web page in Groovy and extract all of the href links and the associated text with it.

If the page contained these links:



        
7条回答
  •  误落风尘
    2020-11-21 01:17

    Use either of these depending how you want backslashes in the shell variables handled (avar is an awk variable, svar is a shell variable):

    awk -v avar="$svar" '... avar ...' file
    awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}... avar ...' "$svar" file
    

    See http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details and other options. The first method above is almost always your best option and has the most obvious semantics.

提交回复
热议问题