问题
I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string:
<select id=\"foo_Bar\" name=\"foo.Bar\"><option selected=\"selected\" value=\"1\">One</option><option value=\"2\">Two</option></select>
This does some unexpected things. First, it gives two parser errors, EndTagNotRequired. Second, the select node has 4 children - two for the option tags and two more for the inner text of the option tags. Last, the OuterHtml is like this:
<select id=\"foo_Bar\" name=\"foo.Bar\"><option selected=\"selected\" value=\"1\">One<option value=\"2\">Two</select>
So basically it is deciding for me to drop the closing tags on the options. Let\'s leave aside for a moment whether it is proper and desirable to do that. I am using HtmlAgilityPack to test HTML generation code, so I don\'t want it to make any decision for me or give any errors unless the HTML is truly malformed. Is there some way to make it behave how I want? I tried setting some of the options for HtmlDocument, specifically:
doc.OptionAutoCloseOnEnd = false;
doc.OptionCheckSyntax = false;
doc.OptionFixNestedTags = false;
This is not working. If HtmlAgilityPack cannot do what I want, can you recommend something that can?
回答1:
The exact same error is reported on the HAP home page's discussion, but it looks like no meaningful fixes have been made to the project in a few years. Not encouraging.
A quick browse of the source suggests the error might be fixable by commenting out line 92 of HtmlNode.cs:
// they sometimes contain, and sometimes they don 't...
ElementsFlags.Add("option", HtmlElementFlag.Empty);
(Actually no, they always contain label text, although a blank string would also be valid text. A careless author might omit the end-tag, but then that's true of any element.)
ADD
An equivalent solution is calling HtmlNode.ElementsFlags.Remove("option"); before any use of liberary (without need to modify the liberary source code)
回答2:
It seems that there is some reason not to parse the Option tag as a "generic" tag, for XHTML compliance, however this can be a real pain in the neck.
My suggestion is to do a whole-string-replace and change all "option" tags to "my_option" tags, that way you:
- Don't have to modify the source of the library (and can upgrade it later).
- Can parse as you usually would.
The original post on HtmlAgilityPack forum can be found at: http://htmlagilitypack.codeplex.com/Thread/View.aspx?ThreadId=14982
来源:https://stackoverflow.com/questions/293342/htmlagilitypack-drops-option-end-tags