A Brief Introduction to XML

XML is a text format that allows for information to be expressed inside of tags that describe its structure. What makes XML useful is the preponderance of tools available for accessing and manipulating this structured data, from libraries in all the major programming languages to built-in functions in major web browsers.

Grammars

A related set of XML tags, their possible structural permutations and data contents constitute an XML grammar. XML grammars can be defined by anyone to meet their own needs, or can be agreed upon by communities to enable a standardized exchange of like information between disconnected and even competitive products. Grammars can be described in a DTD (Document Type Definition) or XML Schema file, enabling standard processing tools to determine whether an XML document is a valid instance of a particular grammar.

Transformations

A powerful companion technology to XML is called XSL: it provides a way to transform XML from one grammar to another using templates. It is this technology which makes XML perfectly suited to managing web site content; data can be stored in XML according to custom grammars, and XSL makes it possible to output that data as XHTML for viewing in a web browser, or as Formatting Objects which can be used to generate PDF files. XSL allows you to implement look-and-feel and formatting without any modification to the XML source.


Working with XML in HyperContent

XML DTDs

DTDs define the possible structure of an XML file. DTDs are normally referred to within the prolog of an XML file in a DOCTYPE declaration. In HyperContent, this declaration may point to the URL of a DTD on the network, a URL which is mapped to a local resource by an XML catalog, or to the path of a file in the same repository as the XML file. HyperContent has a built-in XML editor that allows users to author XML files in accordance with the grammars you have specified in your DTDs, directly from their web browser. HyperContent also provides a framework for plugging in custom web-authoring interfaces to provide more tailored tools for managing files according to specific XML grammars. For an introduction to DTDs, visit w3schools.

XHTML

HyperContent's built-in XML editor also supports WYSIWYG XHTML editing. This is automatically enabled for any XML element named "html". You can either specify "html" as the root element of a document, or as a child element so that you can store XHTML along with custom structured data. HyperContent will support any number of "html" elements in a document.

XML Catalog

Oasis, an Internet standards body, has defined a standard format for a file which can be used to map URLs of Internet resources to other URLs or local copies of those resources. This is useful to decrease load time of DTDs, to protect against network outages, to allow for offline development, or to quietly "convert" XML files conforming to a certain DTD to an updated or modified version of the DTD, without removing the original. HyperContent provides an XML catalog file called "Catalog.xml" in the hypercontent properties directory. Please visit the OASIS page for more information.

Managing structure across many XML files

As a text file format, XML holds out the promise of being legible by both computers and humans. However, in dealing with large amounts of data it becomes challenging to fulfill this promise: to maintain human legibility, it is of significant value to break the data up into smaller files, while for machine processing one giant file is required to maintain coherent data structures. HyperContent makes it possible to achieve both goals by allowing you to define structure across multiple XML files, which can be managed individually by humans but processed together by the computer.

HyperContent allows you to use path patterns to identify particular data structures according to where the files are stored in the filesystem, so that you can use XML Includes to bring related data together at build time. In your XSL transformations you can work with included data based on its path pattern in order to apply common formatting, just as you would if each file were an element in one giant XML file.

There are two elements that can be used in a HyperContent project definition to define XML content:

< xml-doc >

An xml-doc tag is used to specify a single XML file that must exist in the project repository. Common uses for this are specifying a home page, XSL files, and configuration files. An xml-doc tag has three required attributes:

definition =

The location of the DTD which specifies the grammar this file conforms to. This will be used in constructing the DOCTYPE portion of the file's prolog. This can be a URL, the path of a DTD in the project repository (e.g. '/dtds/common.dtd'), or blank if this will be an xsl file. If you want to specify a document to hold a custom dictionary for the project, to be used by the spell checker, you should specify the URL http://hypercontent.sourceforge.net/dtd/dictionary.dtd.
path =
The path of this file in the project repository. e.g.

/index.xml
/xsl/common.xsl
/dictionary.xml
root =
The name of the root element of this document. e.g.

xsl:stylesheet
myelement
html

< xml-doctype >

 
The xml-doctype tag is used to describe a class of XML documents: it acts as a template for 0 or more XML files that share the same structure and function within the site.

definition = (see above)
path =
The path of an xml-doctype differs from that of an xml-doc in that it it describes a possible rather than literal location. Possibilities are expressed using wildcards: e.g. '/*/index.xml' indicates the index page of directory beneath the repository root; '/*/*' indicates a file one level down in the repository. Wildcards are replaced with actual names by authors at the time they create a new instance of this doctype. For a more complete explanation and examples, see the chapter on path patterns.
root = (see above)
label =
The label of a doctype should be a short, descriptive text describing a file which is an instance of this doctype. For example, if you were specifying a doctype for a faculty member page, you could label it "Faculty" or "Faculty Member Biography".
template =
This allows you to specify the path of a file in the repository which will act as a template for new instances of this doctype. When an author creates a new instance of this doctype, they will be provided with a copy of the data and metadata from the template file. If you do not specify a template, the new file will consist of an empty root element which can be populated from scratch. You may want to specify the template file as an xml-doc, to guarantee that it can not be removed from the repository.
 

Forms of Output

Both xml-docs and xml-doctypes can be configured with 0 or more forms of output. For example, you might configure a file to be output as standard HTML, accessible HTML, RSS and PDF. Other files may not have any forms of output, such as XSL and configuration files - these are intended to be used in the maintenance and building of the site, rather than published.

< output >
The output tag has two required attributes

content-type =
This attribute is used to specify the content type of the output file; this will be used to determine the appropriate file extension, and may trigger specific behavior in transformation or filters. For example, specifying a content-type of "application/pdf" triggers the XSL transformation filter to treat its output as Formatting Objects and to render these into a PDF file. "text/html" is the content type for HTML files. Note that you can only have one output form of a given content-type per base directory.
basedir =
By default, the base directory of a form of output is set to "/", which indicates that the output file will live at the same path in the output filesystem as in the repository, with only the file extension changing according to the content type. If you need to have several forms of output with the same content type, for example standard and accessible HTML, you could specify a basedir of "/" for the standard form and "/acc/" for the accessible form. The accessible version of the home page would then be output to "/acc/index.html".
The output element can be configured with 0 or more include elements, 0 or more filter elements and 0 or 1 transform elements. Follow the links to the appropriate chapters for more information.

Overriding Default Editors

By default, XML files will be configured to use the default editors as configured in the Content Types file. You can override the defaults for an xml-doc or xml-doctype by adding editor elements inside it. You must specify all the editors that you want to be used; once you add an editor element, the defaults are ignored. The editors appear in the order you specify them: thus, the first editor element will be the opening screen for an author who chooses to edit a file corresponding to this xml-doc or xml-doctype.

< editor >

The editor element has a single attribute:

key =
(see the chapter on Editors)