Grass Valley iControl Services Gateway User Manual

Page 24

Advertising
background image

Page 24 of 26

Appendix A XML Primer (

W3C's technical reports page

)

What is XML

XML is a method for putting structured data in a text file. For "structured data" think of such things as
spreadsheets, address books, configuration parameters, financial transactions, technical drawings, etc.
Programs that produce such data often also store it on disk, for which they can use either a binary format
or a text format. The latter allows you, if necessary, to look at the data without the program that produced
it. XML is a set of rules, guidelines, conventions, whatever you want to call them, for designing text
formats for such data, in a way that produces files that are easy to generate and read (by a computer),
that are unambiguous, and that avoid common pitfalls, such as lack of extensibility, lack of support for
internationalization/localization, and platform-dependency.

XML History

Development of XML started in 1996 and it is a W3C standard since February 1998, which may make
you suspect that this is rather immature technology. But in fact the technology isn't very new. Before
XML there was SGML, developed in the early '80s, an ISO standard since 1986, and widely used for
large documentation projects. And of course HTML, whose development started in 1990. The designers
of XML simply took the best parts of SGML, guided by the experience with HTML, and produced
something that is no less powerful than SGML, but vastly more regular and simpler to use. Some
evolutions, however, are hard to distinguish from revolutions... And it must be said that while SGML is
mostly used for technical documentation and much less for other kinds of data, with XML it is exactly the
opposite.

XML vs HTML

Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form
name="value"), but while HTML specifies what each tag & attribute means (and often how the text
between them will look in a browser), XML uses the tags only to delimit pieces of data, and leaves the
interpretation of the data completely to the application that reads it. In other words, if you see "<p>" in an
XML file, don't assume it is a paragraph. Depending on the context, it may be a price, a parameter, a
person…

XML files are text files, as I said above, but even less than HTML are they meant to be read by humans.
They are text files, because that allows experts (such as programmers) to more easily debug
applications, and in emergencies, they can use a simple text editor to fix a broken XML file. But the rules
for XML files are much stricter than for HTML. A forgotten tag, or an attribute without quotes makes the
file unusable, while in HTML such practice is often explicitly allowed, or at least tolerated. It is written in
the official XML specification: applications are not allowed to try to second-guess the creator of a broken
XML file; if the file is broken, an application has to stop right there and issue an error.

Since XML is a text format, and it uses tags to delimit the data, XML files are nearly always larger than
comparable binary formats. That was a conscious decision by the XML developers. The advantages of a
text format are evident (see 3 above), and the disadvantages can usually be compensated at a different
level. Disk space isn't as expensive anymore as it used to be, and programs like zip and

gzip

can

compress files very well and very fast. Those programs are available for nearly all platforms (and are
usually free). In addition, communication protocols such as modem protocols and

HTTP/1.1

(the core

protocol of the Web) can compress data on the fly, thus saving bandwidth as effectively as a binary
format.

Advertising