In an interview I was asked a question that I\'d never thought about, which was \"We already have HTML which fulfills all the requirements of writing a web page, so what\'s
If i want to crawl your site, and parse its contents, i can only do it if it's XML.
Parsing HTML is a nightmare.
Because it is valid XML. That helps a lot since you can use a lot of tools originally designed for XML, such as XML parsers, XSLT, XPath, XQuery, ...
Normal HTML is a SGML dialect and that is not parsable without knowledge of the schema.
<ul>
<li>one
<li>two
<li>three
</ul>
is correct HTML but not correct XML. If you want to parse that, you have to know that ul
-elements have to be closed but li
s don't.
I think that it helps browsers correctly display the html without making assumptions about where tags should be closed. Any time a browsers assumes something you know what happens.
Why was XHTML created?
How well has it succeeded?
What is the need for XHTML?
XHTML had laudable goals and maybe it will be able to deliver in the future. I can't recommend XHTML for the possible future advantages it might provide, when HTML is much easier now. You should only really use XHTML if previous code or your tools force you to.
In a nutshell: XHTML is often only beneficial and preferred over HTML whenever you want to use a XML based tool to manipulate/transform/generate HTML pages on the server side.
Lot of examples can be found in component based MVC frameworks like Sun Oracle JSF which uses Facelets as a XHTML based view technology. The server side components are definied in XSD's and the pages are parsed using a SAX parser. You can even add a <!DOCTYPE html>
to top of the page to let Facelets generate "pure" valid and strict HTML5. Microsoft ASP.NET MVC has a similar view technology.
When you're hand-writing HTML, XHTML doesn't add much benefit, or it must be pushing off the "coolness" of using a (over)hyped technology.
XHTML is simply about communication between systems. HTML is very difficult to parse, because of the number of variations that can occur as to what is well formed. Since XML is strict in its interpretation, this problem has been removed.
Think about a RESTful architecture. If a URL is permanent location to an item, then systems which would want to access this item should be able to consume the information returned from accessing the URL. XHTML doesn't make this possible per se, because a system could already parse the HTML and retrieve the necessary information. XML just makes this easier. There is no limiting predefined set of tags which make it difficult to classify data in a document (althought techinically you can do this in HTML, because browsers will ignore it). You can use whatever you want to classify what data is retrieved.