How do you parse an HTML string for image tags to get at the SRC information?

前端未结

关注

 4  1824

清酒与你 2020-12-08 20:26

Currently I use .Net WebBrowser.Document.Images() to do this. It requires the Webrowser to load the document. It\'s messy and takes up resources. <

4条回答

予麋鹿 (楼主)

2020-12-08 20:35
The big issue with any HTML parsing is the "well formed" part. You've seen the crap HTML out there - how much of it is really well formed? I needed to do something similar - parse out all links in a document (and in my case) update them with a rewritten link. I found the Html Agility Pack over on CodePlex. It rocks (and handles malformed HTML).

Here's a snippet for iterating over links in a document:
```
HtmlDocument doc = new HtmlDocument();
doc.Load(@"C:\Sample.HTM");
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//a/@href");

Content match = null;

// Run only if there are links in the document.
if (linkNodes != null)
{
    foreach (HtmlNode linkNode in linkNodes)
    {
        HtmlAttribute attrib = linkNode.Attributes["href"];
        // Do whatever else you need here
    }
}
```
Original Blog Post
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...