W3C breaks XHTML 1.1 parsing by removing modules from web site

戏子无情 提交于 2020-04-17 22:12:52

问题


The W3C recommended list of doctype declarations indicates the following doctype for XHTML 1.1:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

This is the same system ID recommended by A List Apart, the Wiley Dummies site, among many others. It was one of the standard system ID for the modular XHTML 1.1 DTD.

Unfortunately this modular DTD refers to other XML entities, some of which the W3C has removed from its site, completely breaking parsing.

You can test this in Java 11. Start with the following XHTML 1.1 file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
  <title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>

Try to parse it using a standard, built-in Java parser:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
  document = documentBuilder.parse(inputStream);
}

Parsing will fail, throwing a java.io.FileNotFoundException for http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod. Apparently the W3C has removed this entity from its web site altogether.

If instead http://www.w3.org/MarkUp/DTD/xhtml11.dtd is used (which appears a a comment in the XHTML 1.1 specification DTD), parsing completes normally (albeit after about 10 minutes).

Why does the W3C make insufficient entities available at the http://www.w3.org/TR/xhtml11/DTD/ collection, breaking XHTML 1.1 parsing with a standard system ID? Why aren't all the modules available that are available at http://www.w3.org/MarkUp/DTD/? Who at the W3C should I contact to get this fixed? (And why does HTTP access take so long for these entities?)


回答1:


The URL you mentioned as alternative - http://www.w3.org/MarkUp/DTD/xhtml11.dtd - seems to be consistently used in the XHTML 1.1 specs/DTDs/modules and appears to be the one endorsed by W3C, rather than http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd. My guess is access to these declaration sets is deliberately throttled, as W3C doesn't want to serve these to the general public; you're supposed to store these locally and use an SGML/XML catalog file mapping identifiers to your local entity/declaration sets.

I had success in validating an XHTML 1.1 file using libxml2's xmllint command-line tool by invoking

 SGML_CATALOG_FILES=./catalog xmllint --catalogs --dtdvalid xhtml11.dtd testdoc.xhtml

with a catalog file having the following content (and the referenced .dtd, .mod and .ent files in place in that directory, of course):

OVERRIDE YES

SGMLDECL "xml1.dcl"
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Common Attributes 1.0//EN" "xhtml-attribs-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod" "xhtml-attribs-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Base Element 1.0//EN" "xhtml-base-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod" "xhtml-base-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML BDO Element 1.0//EN" "xhtml-bdo-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod" "xhtml-bdo-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN" "xhtml-blkphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod" "xhtml-blkphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN" "xhtml-blkpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod" "xhtml-blkpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Structural 1.0//EN" "xhtml-blkstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod" "xhtml-blkstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Character Entities 1.0//EN" "xhtml-charent-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod" "xhtml-charent-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN" "xhtml-csismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod" "xhtml-csismap-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Datatypes 1.0//EN" "xhtml-datatypes-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod" "xhtml-datatypes-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN" "xhtml-edit-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod" "xhtml-edit-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN" "xhtml-events-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod" "xhtml-events-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Forms 1.0//EN" "xhtml-form-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod" "xhtml-form-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Modular Framework 1.0//EN" "xhtml-framework-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod" "xhtml-framework-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Hypertext 1.0//EN" "xhtml-hypertext-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod" "xhtml-hypertext-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Images 1.0//EN" "xhtml-image-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod" "xhtml-image-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN" "xhtml-inlphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod" "xhtml-inlphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN" "xhtml-inlpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod" "xhtml-inlpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN" "xhtml-inlstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod" "xhtml-inlstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Inline Style 1.0//EN" "xhtml-inlstyle-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod" "xhtml-inlstyle-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN" "xhtml-legacy-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod" "xhtml-legacy-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Link Element 1.0//EN" "xhtml-link-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod" "xhtml-link-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Lists 1.0//EN" "xhtml-list-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod" "xhtml-list-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Metainformation 1.0//EN" "xhtml-meta-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod" "xhtml-meta-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN" "xhtml-object-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod" "xhtml-object-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Param Element 1.0//EN" "xhtml-param-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod" "xhtml-param-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Presentation 1.0//EN" "xhtml-pres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod" "xhtml-pres-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Qualified Names 1.0//EN" "xhtml-qname-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod" "xhtml-qname-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Ruby 1.0//EN" "xhtml-ruby-1.mod"
SYSTEM "http://www.w3.org/TR/ruby/xhtml-ruby-1.mod" "xhtml-ruby-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Scripting 1.0//EN" "xhtml-script-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod" "xhtml-script-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN" "xhtml-ssismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod" "xhtml-ssismap-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Document Structure 1.0//EN" "xhtml-struct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod" "xhtml-struct-1.mod"
PUBLIC "-//W3C//DTD XHTML Style Sheets 1.0//EN" "xhtml-style-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod" "xhtml-style-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Tables 1.0//EN" "xhtml-table-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod" "xhtml-table-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Text 1.0//EN" "xhtml-text-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod" "xhtml-text-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent" "xhtml-lat1.ent"
PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-special.ent" "xhtml-special.ent"
PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent" "xhtml-symbol.ent"

Note this is SGML/traditional/plain catalog syntax. If you want to use it with Java/JAXP, you'll have to convert it into a catalog file in XML syntax.



来源:https://stackoverflow.com/questions/60655704/w3c-breaks-xhtml-1-1-parsing-by-removing-modules-from-web-site

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!