Differentiating between XHTML and HTML with PHP DOMDocument

隐身守侯 提交于 2019-12-10 21:29:36

问题


I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.

In want to know if the loaded content is either XHTML or HTML. DOMDocument has a doctype object which contains the DOCTYPE declaration from the document itself. So far I thought about comparing $dom->doctype->publicId which contains strings like "-//W3C//DTD HTML 4.01//ENtext/html"

Is there any better way anyone can think of?

Edit:

Sorry if my question was a bit unclear. I updated the question since it might have been confusing. But to make it clear now: This question is not about handling HTML with PHP DOM in general or whether XHTML is good or bad.


回答1:


If you're loading from an external source, you can check the file's MIME type and see if it's application/xhtml+xml; if it is, it's most definitely XHTML (of course it can lie and serve with that type, but with horribly malformed markup). Otherwise if it's text/html then it'll be parsed as HTML tag soup. Validity of the actual markup aside, the doctype declaration is your next best way of telling whether the content is (or claims to be) HTML or XHTML.

Like you say, you can check the public identifier and/or the URI and determine the type from there.



来源:https://stackoverflow.com/questions/4610143/differentiating-between-xhtml-and-html-with-php-domdocument

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!