[removed] REGEX to change all relative Urls to Absolute

后端 未结 5 1901
灰色年华
灰色年华 2020-12-02 14:54

I\'m currently creating a Node.js webscraper/proxy, but I\'m having trouble parsing relative Urls found in the scripting part of the source, I figured REGEX would do the tri

5条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-02 15:24

    If you use a regex to find all non-absolute URLs, you can then just prefix them with the current URL and that should be it.

    The URLs you need to fix would be ones which don't start either with a / or http(s):// (or other protocol markers, if you care about them)

    As an example, let's say you're scraping http://www.example.com/. If you encounter a relative URL, let's say foo/bar, you would simply prefix the URL being scraped to it like so: http://www.example.com/foo/bar

    For a regex to scrape the URLs from the page, there are probably plenty of good ones available if you google a bit so I'm not going to start inventing a poor one here :)

提交回复
热议问题