I\'m mirroring some internal websites for backup purposes. As of right now I basically use this c# code:
System.Net.WebClient client = new System.Net.WebCli
Uri WebsiteImAt = new Uri(
"http://www.w3schools.com/media/media_mimeref.asp?q=1&s=2,2#a");
string href = new Uri(WebsiteImAt, "/something/somethingelse/filename.asp")
.AbsoluteUri;
string href2 = new Uri(WebsiteImAt, "something.asp").AbsoluteUri;
string href3 = new Uri(WebsiteImAt, "something").AbsoluteUri;
which with your Regex
-based approach is probably (untested) mappable to:
String value = Regex.Replace(text, "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>", match =>
"<" + match.Groups[1].Value + match.Groups[2].Value + "=\""
+ new Uri(WebsiteImAt, match.Groups[3].Value).AbsoluteUri + "\""
+ match.Groups[4].Value + ">",RegexOptions.IgnoreCase | RegexOptions.Multiline);
I should also advise not to use Regex
here, but to apply the Uri trick to some code using a DOM, perhaps XmlDocument
(if xhtml) or the HTML Agility Pack (otherwise), looking at all //@src
or //@href
attributes.