问题
Possible Duplicate:
What is the best way to parse html in C#?
I am parsing an HTML file. I need find all the href tags in an html and replace them with a text friendly version.
Here is an example.
Original Text: <a href="http://foo.bar">click here</a>
replacement value: click here <http://foo.bar>
How do I achieve this?
回答1:
You could use the Html Agility Pack library, with a code like this:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile); // load your file
// select recursively all A elements declaring an HREF attribute.
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@href]"))
{
node.ParentNode.ReplaceChild(doc.CreateTextNode(node.InnerText + " <" + node.GetAttributeValue("href", null) + ">"), node);
}
doc.Save(Console.Out); // output the new doc.
来源:https://stackoverflow.com/questions/13126238/html-find-and-replace-href-tags