I have this web page source:
<a href="/StefaniStoikova"><img alt="" class="head" id="face_6306494" src="http://img0.ask.fm/assets/054/771/271/thumb_tiny/sam_7082.jpg" /></a>
<a href="/devos"><img alt="" class="head" id="face_18603180" src="http://img7.ask.fm/assets/043/424/871/thumb_tiny/devos.jpg" /></a>
<a href="/frenop"><img alt="" class="head" id="face_4953081" src="http://img1.ask.fm/assets/029/163/760/thumb_tiny/dsci0744.jpg" /></a>
And I want to extract the string right after the <a href-"
. But my main problem is that these strings are different and I don't seem to find a way. With neither agilitypack or webrequests.
Maybe someone has idea about regular expression? Share it.
It should be quite simple to get what you need with the HtmlAgilityPack. Assuming you have your document loaded into an HtmlDocument
object named doc
:
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//a[@href]");
foreach (HtmlNode node in collection)
{
// Do what you want with the href value in here. As an example, this just
// just prints the value to the console.
Console.WriteLine(node.GetAttributeValue("href", "default"));
}
来源:https://stackoverflow.com/questions/13002274/extract-all-a-hrefs-from-webpage-with-htmlagilitypack-requests-anything