Regex to parse out html from CDATA with C#

╄→гoц情女王★ 提交于 2019-12-06 04:05:57

问题


I would like to parse out any HTML data that is returned wrapped in CDATA.

As an example <![CDATA[<table><tr><td>Approved</td></tr></table>]]>

Thanks!


回答1:


The expression to handle your example would be

\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>

Where the group "text" will contain your HTML.

The C# code you need is:

using System.Text.RegularExpressions;
RegexOptions   options = RegexOptions.None;
Regex          regex = new Regex(@"\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>", options);
string         input = @"<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";

// Check for match
bool   isMatch = regex.IsMatch(input);
if( isMatch )
  Match   match = regex.Match(input);
  string   HTMLtext = match.Groups["text"].Value;
end if

The "input" variable is in there just to use the sample input you provided




回答2:


I know this might seem incredibly simple, but have you tried string.Replace()?

string x = "<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";
string y = x.Replace("<![CDATA[", string.Empty).Replace("]]>", string.Empty);

There are probably more efficient ways to handle this, but it might be that you want something that easy...




回答3:


Not much detail, but a very simple regex should match it if there isn't complexity that you didn't describe:

/<!\[CDATA\[(.*?)\]\]>/



回答4:


The regex to find CDATA sections would be:

(?:<!\[CDATA\[)(.*?)(?:\]\]>)



回答5:


Regex r = new Regex("(?<=<!\[CDATA\[).*?(?=\]\])");



回答6:


Why do you want to use Regex for such a simple task? Try this one:

str = str.Trim().Substring(9);
str = str.Substring(0, str.Length-3);


来源:https://stackoverflow.com/questions/812303/regex-to-parse-out-html-from-cdata-with-c-sharp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!