This page describes the role of XML as it relates to HTML. There is a whole world of other things going on with XML that has nothing to with document formatting or the Web, but that's a subject for another page.
See ExtensibleMarkupLanguage
----
XML will replace HTML, eventually. XML has a sister specification called XSL which allows you to define rules for processing the data and formatting it for a user. This solves one of the key problems with HTML - little or no reuse. Web pages have data but it is embedded in the formatting and that keeps you from readily changing the look and feel. XSL and XML are the key to reusable data. Already there are databases which allow you to perform queries with XML and return XML data. I am sure that you can see how nice that will be for developers.
''XML will replace HTML, eventually.'' Maybe. Maybe not. In our shop we think of XML as doing for data what the various IDLs do for behaviour, as discussed elsewhere on this page. We absolutely do not see XML as a client-side technology, since that would involved pushing our clients' data dictionary and schemas over the wire. Since our clients consider their data models to be very valuable corporate secrets, that idea really isn't going to go down too well. We see XML->HTML via XSL as the way forward. --KeithBraithwaite
If anything replaces HTML, then it's XHTML, not XML. -- JuergenHermann
----
Here's how I am thinking about it, until such time as I ever become expert in the matter. Perhaps someone can derail this thinking if it is incorrect.
HTML and SGML actually define objects in UniCode text. You write or generate,
Body of text, and possibly nested tags.
The TAGNAME corresponds to an object type, and the Vals to attributes or instance variables. cool enough. We used that idea to export our sequence charts and make them trivial to generate, parse, debug and send around the net. HTML and SGML work because everyone is told in advance what the tags are. But other people than me want to invent their own object types and tags.
My understanding is that XML is the meta-level thing. They predefine some tags so your document starts off by naming the tags it will use, the TAGs and their VALues. Then the next reader can test to see what of the stuff they are going to understand, and at least parse the stuff they won't understand.
Anyone confirm or break that view? --Alistair
----
XML is quite tied to UniCode, which doesn't matter much to us english speakers, but is quite important for everyone else.
-----
Pretty close. SGML is the root, a heavy duty markup language for enterprise-scale publishing. It's very configurable, but things have to be done properly to get through the parser. You can buy thick books about it written by people from IBM.
HTML is the cheap and cheerful version with just a few tags pre-defined by TimBernersLee, then all the browser vendors, then the W3C. The browsers are very forgiving, which is why it's easy to write HTML but hard to write browsers now.
XML is an attempt to rationalise all this, so the syntax has been tidied up a bit, and meta-definitions (DTD) have come back -- although your parsers don't have to validate. The push now is to provide interesting extensions, such as search and incremental update, and to move it away from its document roots to be more generic.
----
XML does not come with a predefined set of tags to allow you to
define your own tags.
It does come with the necessary syntax to allow you to describe which
DocumentTypeDefinition you are writing to, and hence what the valid tags
will be.
The DTD is not written in XML, though it's obvious now that defining a new
syntax is bad when you are trying to make a standard syntax for everything.
So it probably will be the case that there will be a standard DTD for DTDs
in XML, but not yet AFAIK. --JohnFarrell
----
HTML, SGML, and XML. Why three different standards?
SGML is semantic markup for storage (define your own tags).
XML is semantic markup for interoperability (define your own tags).
''These are WAY too simplistic. Yes, you can use SGML for data storage, but people use it for all sorts of bizarre things like multimedia presentations and adding link information to CIA spy satellite data. No, I'm not kidding. And XML is good for a lot more than interoperability; for one thing, you can use it as a human-readable (and human-editable) serialization mechanism. I use XML constantly in places where it has nothing to do with interoperability. -- JamesWilson''
HTML is visual markup for presentation (predefined tags).
XML will not replace SGML or HTML. It addresses a new market - integration. XML is the currency of the information economy. (Someone else said this first)
A proposal for web sites such as CNN.com:
Save stories in SGML with full semantic information. Create an engine which reads SGML and produces XML in various formats.
Define a standard XML news story format. Let search engines read this format. This would enable searches that understand authors, dates, and so on.
Create an engine that reads XML news stories and produces HTML. Let the web browser choose what HTML format they want - IE, Pilot, 3.2, 4.0.
-- EricUlevik
----
HTML started out as a subset of SGML, intended to be a simple document description language. Tags like
, represented document structure attributes -- the viewer (there was no such thing as a browser then) was expected to interpret and display the tagged text in a way that was appropriate for the characteristics of the particular user interface and that user's preferences.
At some point the print-world graphic designers got involved and started deciding that it was a page description language, or should be. The semantics of the tags were overloaded to included visual display hints. This trend led to extension tags to HTML such as , , and the much hated