Regular expression to match everything, except HTML tags

懵懂的女人 提交于 2020-01-07 01:53:28

问题


<tr><td>Di, 12.04.16</td><td>1</td><td>D</td><td>D</td><td>255</td><td>ABC</td><tr>

I want to only match ABC or anything else that stand between

<td>
</td> (before and after ABC)

This Patter doesnt work for me:

((?!<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>[1-9][0-2]?</td><td>[A-Z]?[A-Z]?[A-Z]?[A-Z]?[1-5]?</td><td>(---|[A-Z]?[A-Z]?[A-Z]?[A-Z]?[1-5]?)</td><td>).*(?!</td></tr>))

Do you have any idea? Thx for help


回答1:


As Amy said, don't use regex to parse HTML. You can install Html Agility Pack from NuGet and use System.Linq Namespace to parse it.

For example here:

string html = "<html><head></head><body><p class='testclass'>This is a paragraph.</p><table><tr><td>Di, 12.04.16</td><td>1</td><td>D</td><td>D</td><td>255</td><td>ABC</td><tr></table></body></html>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var programmes = doc.DocumentNode.Descendants().Where(d => d.GetAttributeValue("class", "") == "testclass");
var trs = doc.DocumentNode.Descendants("tr"); // Give you all the trs
foreach (var tr in trs)
{
    var tds = tr.Descendants("td").ToArray(); // Get all the tds
    //Sample, show the result in a TextBlock
    foreach (var td in tds)
    {
        txt.Text = txt.Text + " " + td.InnerText;
    }
}

The result is so:



来源:https://stackoverflow.com/questions/36553222/regular-expression-to-match-everything-except-html-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!