问题
I have searched and searched about Regex but I can't seem to find something that will allow me to do this.
I need to get the 12.32, 2,300, 4.644 M and 12,444.12 from the following strings in C#:
<td class="c-ob-j1a" property="c-value">12.32</td>
<td class="c-ob-j1a" property="c-value">2,300</td>
<td class="c-ob-j1a" property="c-value">4.644 M</td>
<td class="c-ob-j1a" property="c-value">12,444.12 M</td>
I got up to this:
MatchCollection valueCollection = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">(?<Value>P{</td>})</td>");
Thanks!
回答1:
You should not use regexp to parse HTML. See this post on howto parse html What is the best way to parse html in C#? or you could use HtmlAgilityPack http://www.codeplex.com/htmlagilitypack
but if you really want to use regex this should work.
<td[^>](.+?)<\/td>
回答2:
"value">(.*?)<\/td>
should do it for you. The value you require would be held in the capturing group denoted by the parentheses
回答3:
Something like this should work:
/<td[.]*?>(.+)<\/td>/
Regarding your code sample, this would probably be more maintainable:
MatchCollection valueCollection = Regex.Matches(html, @"<td[^>]*?>(?<Value>.*?)</td>")
If your html consists of other td
's which you don't want to extract data from, your original regex should be fine.
回答4:
I'd probably start with a very strict match to avoid accidentally capturing other parts of the document:
static void Main(string[] args)
{
string html = @"<td class=""c-ob-j1a"" property=""c-value"">12.32</td>
<td class=""c-ob-j1a"" property=""c-value"">2,300</td>
<td class=""c-ob-j1a"" property=""c-value"">4.644 M</td>
<td class=""c-ob-j1a"" property=""c-value"">12,444.12 M</td>";
var matches = Regex.Matches(html, @"<td class=""c-ob-j1a"" property=""c-value"">([^<]*)</td>");
foreach (Match match in matches)
Console.WriteLine(match.Groups[1].Value);
}
(And I would also like to take this opportunity to recommend the Html Agility Pack if you haven't tried it yet.)
回答5:
If all you need is to parse the td tag in the formats you presented you might get away with a regex.
In general parsing html with regex is not working. You can find many questions here on SO explaining why
来源:https://stackoverflow.com/questions/1894995/regex-html-extraction-c-sharp