I\'m currently creating a Node.js webscraper/proxy, but I\'m having trouble parsing relative Urls found in the scripting part of the source, I figured REGEX would do the tri
If you use a regex to find all non-absolute URLs, you can then just prefix them with the current URL and that should be it.
The URLs you need to fix would be ones which don't start either with a / or http(s):// (or other protocol markers, if you care about them)
As an example, let's say you're scraping http://www.example.com/. If you encounter a relative URL, let's say foo/bar, you would simply prefix the URL being scraped to it like so: http://www.example.com/foo/bar
For a regex to scrape the URLs from the page, there are probably plenty of good ones available if you google a bit so I'm not going to start inventing a poor one here :)