Regular expression for parsing links from a webpage?

前端 未结 9 709
南旧
南旧 2020-11-27 20:02

I\'m looking for a .NET regular expression extract all the URLs from a webpage but haven\'t found one to be comprehensive enough to cover all the different ways you can spec

9条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-27 20:55

    All HTTP's and MAILTO's

    (["'])(mailto:|http:).*?\1
    

    All links, including relative ones, that are called by href or src.

    #Matches things in single or double quotes, but not the quotes themselves
    (?<=(["']))((?<=href=['"])|(?<=src=['"])).*?(?=\1)
    
    #Maches thing in either double or single quotes, including the quotes.
    (["'])((?<=href=")|(?<=src=")).*?\1
    

    The second one will only get you links that use double quotes, however.

提交回复
热议问题