How do you parse a web page and extract all the href links?

后端 未结 7 2037
情话喂你
情话喂你 2021-01-01 19:18

I want to parse a web page in Groovy and extract all of the href links and the associated text with it.

If the page contained these links:



        
7条回答
  •  别那么骄傲
    2021-01-01 19:26

    I don't know java but I think that xpath is far better than classic regular expressions in order to get one (or more) html elements.

    It is also easier to write and to read.

    
       
          1
          2
          3
       
    
    

    With the html above, this expression "/html/body/a" will list all href elements.

    Here's a good step by step tutorial http://www.zvon.org/xxl/XPathTutorial/General/examples.html

提交回复
热议问题