HXT ignoring HTML DTD, replacing it with XML DTD

不羁岁月 提交于 2019-12-10 15:04:53

问题


I'm having a bit of trouble figuring out why HXT is replacing my DTD's. Firstly, here is my input file to be parsed:

<!DOCTYPE html>
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <h1>foo</h1>
  </body>
</html>

and this is the output that I get:

<?xml version="1.0" encoding="US-ASCII"?>
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <h1>foo</h1>
  </body>
</html>

Finally, here is a simplified version of the arrows I'm using:

start (App src dest) = runX $
                         readDocument [ withValidate no
                                      , withSubstDTDEntities no
                                      , withParseHTML yes
                                      --, withTagSoup
                                      ]
                                      src
                         >>>
                         this
                         >>>
                         writeDocument [ withIndent yes
                                       , withSubstDTDEntities no
                                       , withOutputHTML
                                       --, withOutputEncoding "UTF-8"
                                       ]
                                       dest

I apologize for the comments - I've been toying with different combinations of configs. I just can't seem to get HXT to not mess with DTDs, even with withSubstDTDEntities no, withValidate no, etc. I am getting a warning saying that HXT is ignoring my doctype declaration, but that's the only bit of insight I have. Can anyone please lend me a hand? Thank you in advance!


回答1:


You have two problems

HXT only accepts one of the following three html doctypes

<!DOCTYPE html 
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "DTD/xhtml1-strict.dtd">

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "DTD/xhtml1-transitional.dtd">

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
 "DTD/xhtml1-frameset.dtd">

Using one of these will get rid of the warning about ignoring the dtd.

Second, add the following option to writeDocument

withAddDefaultDTD yes


来源:https://stackoverflow.com/questions/26763855/hxt-ignoring-html-dtd-replacing-it-with-xml-dtd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!