How do you parse a web page and extract all the href links?

后端 未结 7 2043
情话喂你
情话喂你 2021-01-01 19:18

I want to parse a web page in Groovy and extract all of the href links and the associated text with it.

If the page contained these links:



        
7条回答
  •  佛祖请我去吃肉
    2021-01-01 19:44

    Try a regular expression. Something like this should work:

    (html =~ /(.*?)<\/a>/).each { url, text -> 
        // do something with url and text
    }
    

    Take a look at Groovy - Tutorial 4 - Regular expressions basics and Anchor Tag Regular Expression Breaking.

提交回复
热议问题