How to extract full url with HtmlAgilityPack - C#

回眸只為那壹抹淺笑 提交于 2019-11-26 23:09:48

问题


Alright with the way below it is extracting only referring url like this

the extraction code :

foreach (HtmlNode link in hdDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    lsLinks.Add(link.Attributes["href"].Value.ToString());
}

The url code

<a href="Login.aspx">Login</a>

The extracted url

Login.aspx

But i want to get real link what browser parsed like

http://www.monstermmorpg.com/Login.aspx

I can do it with checking the url whether containing http and if not add the domain value but it may cause some problems at some occasions and i think not a very wise solution.

c# 4.0 , HtmlAgilityPack.1.4.0


回答1:


Assuming you have the original url, you can combine the parsed url something like this:

// The address of the page you crawled
var baseUrl = new Uri("http://example.com/path/to-page/here.aspx");

// root relative
var url = new Uri(baseUrl, "/Login.aspx");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/Logon.aspx'

// relative
url = new Uri(baseUrl, "../foo.aspx?q=1");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/path/foo.aspx?q=1'

// absolute
url = new Uri(baseUrl, "http://stackoverflow.com/questions/7760286/");
Console.WriteLine (url.AbsoluteUri); // prints 'http://stackoverflow.com/questions/7760286/'

// other...
url = new Uri(baseUrl, "javascript:void(0)");
Console.WriteLine (url.AbsoluteUri); // prints 'javascript:void(0)'

Note the use of AbsoluteUri and not relying on ToString() because ToString decodes the URL (to make it more "human-readable"), which is not typically what you want.




回答2:


I can do it with checking the url whether containing http and if not add the domain value

That's what you should do. Html Agility Pack has nothing to help you with this:

var url = new Uri(
    new Uri(baseUrl).GetLeftPart(UriPartial.Path), 
    link.Attributes["href"].Value)
); 


来源:https://stackoverflow.com/questions/7760286/how-to-extract-full-url-with-htmlagilitypack-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!