[removed] REGEX to change all relative Urls to Absolute

后端未结

关注

 5  1901

灰色年华 2020-12-02 14:54

I\'m currently creating a Node.js webscraper/proxy, but I\'m having trouble parsing relative Urls found in the scripting part of the source, I figured REGEX would do the tri

5条回答

予麋鹿 (楼主)

2020-12-02 15:24

If you use a regex to find all non-absolute URLs, you can then just prefix them with the current URL and that should be it.

The URLs you need to fix would be ones which don't start either with a / or http(s):// (or other protocol markers, if you care about them)

As an example, let's say you're scraping http://www.example.com/. If you encounter a relative URL, let's say foo/bar, you would simply prefix the URL being scraped to it like so: http://www.example.com/foo/bar

For a regex to scrape the URLs from the page, there are probably plenty of good ones available if you google a bit so I'm not going to start inventing a poor one here :)

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...