How to find URLs in HTML using Java

前端 未结 4 746
终归单人心
终归单人心 2021-01-25 20:22

I have the following... I wouldn\'t say problem, but situation.

I have some HTML with tags and everything. I want to search the HTML for every URL. I\'m doing it now by

4条回答
  •  庸人自扰
    2021-01-25 20:57

    The best way should be to google for regexes. One example is this one:

        /^(https?):\/\/((?:[a-z0-9.\-]|%[0-9A-F]{2}){3,})(?::(\d+))?((?:\/(?:[a-z0-9\-._~!$&'()+,;=:@]|%[0-9A-F]{2})))(?:\?((?:[a-z0-9\-._~!$&'()+,;=:\/?@]|%[0-9A-F]{2})))?(?:#((?:[a-z0-9\-._~!$&'()+,;=:\/?@]|%[0-9A-F]{2})*))?$/i
    

    found in a hacker news article. As far as I can follow it, it looks good. But there is, as far as I know, no formal regex for this problem. So the best solution is to google for some and try which one matches most of what you want.

提交回复
热议问题