HtmlAgilityPack close form tag automatically

*爱你&永不变心* 提交于 2019-12-08 02:09:46

问题


I am tring to parse an html file with this code:

<div><form>...</div>...</form>

the problem is that the HtmlAgilityPack automatically close the form tag before the div ending tag: <div><form>...</form></div>...</form> so when I parse the form some of the form elements are missing. (I get only the elements befor the automatically added tag)

I already tried:

htmlDoc.OptionFixNestedTags = false;
htmlDoc.OptionAutoCloseOnEnd = false;
htmlDoc.OptionCheckSyntax = false;
HtmlNode.ElementsFlags.Remove("form");
HtmlNode.ElementsFlags.Add("form", HtmlElementFlag.CanOverlap);
HtmlNode.ElementsFlags.Add("div", HtmlElementFlag.CanOverlap);

But nothing helps!

thanks for you help!


回答1:


The following seems to work for me:

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");

_document = new HtmlDocument();
_document.OptionAutoCloseOnEnd = true;
_document.LoadHtml(content);



回答2:


It depends on what you want to do programmatically after the text has been parsed. If you don't want to do anything special with it, the following code:

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml("<div><form>form and div</div>form</form>");

    doc.Save(Console.Out);

will display exactly the same string, that is:

<div><form>form and div</div>form</form>

Because the library was designed from the grounds up to try to keep the original Html as much as possible.

But in terms on how this is represented in the DOM, and in terms of errors, this is another story. You can't have at the same time 1) overlapping elements 2) XML-like DOM (which does not support overlaps) and 3) no errors.

So it depends on what you want to do after parsing.



来源:https://stackoverflow.com/questions/7104652/htmlagilitypack-close-form-tag-automatically

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!