Regular Expression to get the SRC of images in C#

后端未结

关注

 8  1520

I\'m looking for a regular expression to isolate the src value of an img. (I know that this is not the best way to do this but this is what I have to do in this case)

相关标签:

8条回答

迷失自我

2020-11-29 09:37
This is what I use to get the tags out of strings:
```
</? *img[^>]*>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

甜味超标

2020-11-29 09:38

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

0 讨论(0)

星月不相逢

2020-11-29 09:42
This should capture all img tags and just the src part no matter where its located (before or after class etc) and supports html/xhtml :D
```
<img.+?src="(.+?)".+?/?>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-29 09:43
Here is the one I use:
```
<img.*?src\s*?=\s*?(?:(['"])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))[^>]*?>
```
The good part is that it matches any of the below:
```
<img src='test.jpg'>
<img src=test.jpg>
<img src="test.jpg">
```
And it can also match some unexpected scenarios like extra attributes, e.g:
```
<img src = "test.jpg" width="300">
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

攒了一身酷

2020-11-29 09:48

I tried what Francisco Noriega suggested, but it looks that the api to the HtmlAgilityPack has been altered. Here is how I solved it:

        List<string> images = new List<string>();
        WebClient client = new WebClient();
        string site = "http://www.mysite.com";
        var htmlText = client.DownloadString(site);

        var htmlDoc = new HtmlDocument()
                    {
                        OptionFixNestedTags = true,
                        OptionAutoCloseOnEnd = true
                    };

        htmlDoc.LoadHtml(htmlText);

        foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
        {
            HtmlAttribute att = img.Attributes["src"];
            images.Add(att.Value);
        }

0 讨论(0)

清歌不尽

2020-11-29 09:50
I know you say you have to use regex, but if possible i would really give this open source project a chance: HtmlAgilityPack

It is really easy to use, I just discovered it and it helped me out a lot, since I was doing some heavier html parsing. It basically lets you use XPATHS to get your elements.

Their example page is a little outdated, but the API is really easy to understand, and if you are a little bit familiar with xpaths you will get head around it in now time

The code for your query would look something like this: (uncompiled code)
```
 List<string> imgScrs = new List<string>();
 HtmlDocument doc = new HtmlDocument();
 doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
 var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
 foreach (var img in nodes)
 {
    HtmlAttribute att = img["src"];
    imgScrs.Add(att.Value)
 }
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页