Regex HTML Extraction C#

问题

I have searched and searched about Regex but I can't seem to find something that will allow me to do this.

I need to get the 12.32, 2,300, 4.644 M and 12,444.12 from the following strings in C#:

<td class="c-ob-j1a" property="c-value">12.32</td>
<td class="c-ob-j1a" property="c-value">2,300</td>
<td class="c-ob-j1a" property="c-value">4.644 M</td>
<td class="c-ob-j1a" property="c-value">12,444.12 M</td>

I got up to this:

MatchCollection valueCollection = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">(?<Value>P{</td>})</td>");

Thanks!

回答1:

You should not use regexp to parse HTML. See this post on howto parse html What is the best way to parse html in C#? or you could use HtmlAgilityPack http://www.codeplex.com/htmlagilitypack

but if you really want to use regex this should work.

<td[^>](.+?)<\/td>

回答2:

"value">(.*?)<\/td>

should do it for you. The value you require would be held in the capturing group denoted by the parentheses

回答3:

Something like this should work:

/<td[.]*?>(.+)<\/td>/

Regarding your code sample, this would probably be more maintainable:

MatchCollection valueCollection = Regex.Matches(html, @"<td[^>]*?>(?<Value>.*?)</td>")

If your html consists of other td's which you don't want to extract data from, your original regex should be fine.

回答4:

I'd probably start with a very strict match to avoid accidentally capturing other parts of the document:

    static void Main(string[] args)
    {
        string html = @"<td class=""c-ob-j1a"" property=""c-value"">12.32</td>
<td class=""c-ob-j1a"" property=""c-value"">2,300</td>
<td class=""c-ob-j1a"" property=""c-value"">4.644 M</td>
<td class=""c-ob-j1a"" property=""c-value"">12,444.12 M</td>";

        var matches = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">([^<]*)</td>");
        foreach (Match match in matches)
            Console.WriteLine(match.Groups[1].Value);
    }

(And I would also like to take this opportunity to recommend the Html Agility Pack if you haven't tried it yet.)