问题
I did some research and it seems that is standard Jsoup make this change. I wonder if there is a way to configure this or is there some other Parser I can be converted to a document of Jsoup, or some way to fix this?
回答1:
Unfortunately not, the constructor of Tag class changes the name to lower case:
private Tag(String tagName) {
this.tagName = tagName.toLowerCase();
}
But there are two ways to change this behavour:
- If you want a clean solution, you can clone / download the JSoup Git and change this line.
- If you want a dirty solution, you can use reflection.
Example for #2:
Field tagName = Tag.class.getDeclaredField("tagName"); // Get the field which contains the tagname
tagName.setAccessible(true); // Set accessible to allow changes
for( Element element : doc.select("*") ) // Iterate over all tags
{
Tag tag = element.tag(); // Get the tag of the element
String value = tagName.get(tag).toString(); // Get the value (= name) of the tag
if( !value.startsWith("#") ) // You can ignore all tags starting with a '#'
{
tagName.set(tag, value.toUpperCase()); // Set the tagname to the uppercase
}
}
tagName.setAccessible(false); // Revert to false
回答2:
There is ParseSettings class introduced in version 1.9.3. It comes with options to preserve case for tags and attributes.
回答3:
Here is a code sample (version >= 1.11.x):
Parser parser = Parser.htmlParser();
parser.settings(new ParseSettings(true, true));
Document doc = parser.parseInput(html, baseUrl);
回答4:
I am using 1.11.1-SNAPSHOT version which does not have this piece of code.
private Tag(String tagName) {
this.tagName = tagName.toLowerCase();
}
So I checked ParseSettings as suggested above and changed this piece of code from:
static {
htmlDefault = new ParseSettings(false, false);
preserveCase = new ParseSettings(true, true);
}
to:
static {
htmlDefault = new ParseSettings(true, true);
preserveCase = new ParseSettings(true, true);
}
and skipped test cases while building JAR.
来源:https://stackoverflow.com/questions/19666246/parser-jsoup-change-the-tags-to-lower-case-letter