SCDJWS Study Guide: XML Schema
Introduction to XML Schema
A XML schema describes an XML markup language. Specifically it defines which elements and attributes are used in a markup language, how they are ordered and nested, and what their data types are. The XML specification includes the Document Type Definition (DTD), which can be used to describe XML markup languages and to validate XML documents. While DTDs have proven very useful over the years, they are limited. The W3C created a new way to describe markup languages called XML schema.
XML schema is an XML based alternative to DTD. An XML schema not only describes the structure of an XML document, but also address data typing.
What is an XML Schema?
An XML schema is a predefined set of elements/attributes/values for defining "types", these are the the legal building blocks of an XML document. A XML schema does the following:
- define elements that can appear in a document
- define attributes that can appear in a document
- define which elements are child elements
- defines the sequence in which the child elements can appear
- defines the number of child elements
- defines whether an element is empty or can include text
- define default values for attributes
XML schema is an alternative and update to DTD. The main objectives of XML schema include:
- XML format (and therefore accessibility to XML parsing tools)
- extensibility, i.e., by allowing other schemas to be imported
- finer control over data typing
- support for XML namespaces
The following is an example of a comment element, comment.xsd:
<xsd:element name="comment" type="xsd:string"/>
The following is an example of an instance, comment.xml:
<comment>This is a great comment!</comment>
An XML schema document can play the same role for an XML document as an external DTD. It is included into a document to be validated using the XML schema model. To reference an XML schema document from an XML document, add two XML Infoset attributes to the document element of the XML document:
- The first attribute declares the namespace for XMLSchema-instance. The prefix is normally xsi. The value of the attribute is fixed.
- The second attribute specifies the location of the XML schema document. In the simplest case, there is no namespace associated with this location. Therefore the value of the attribute is the file location relative to the current XML file.
At the start of your schema you need to place a few lines of code, known as a Prolog, which defines the markup that follows as a schema, and also defines what type of schema it is. The prolog comes immediately after the XML declaration, which is the first line of code (after any comments).
The Xerces project provides an example of an XML document that refers to an XML schema document. The document element of the XML document is personnel:
<?xml version="1.0" encoding="UTF-8"?>
What is in an XML Schema file?
The contents of an XML schema file are fully described in a DTD, which is an appendix of the first of two parts of the XML schema standard, http://www.w3.org/TR/xmlschema-1/.
Like any XML document, an XML schema document contains one document element. Its name is <schema>. Like a DTD, a schema can contain many different kinds of statements, but the most common ones are:
- element declarations (similar to <!ELEMENT ...> declarations in a DTD)
- type definitions (akin to parameter entities in a DTD)
The attribute list declarations, which are a common part of DTDs, can be in one of two places in an XML schema document: embedded within an element declaration, or as a standalone group, which will be referenced within one or more element declarations.
�XML schemas are the Successors of DTDs
A significant difference between schemas and DTDs is that schemas define many basic data types: string, boolean, float, double, decimal, timeDuration, recurringDuration, binary, and uri. Each of these so-called primitive data types has distinctive lexical representation and other characteristics which delimit their possible values. By typing the data enclosed in an XML document, a schema makes the document computable in ways not possible with the simple (mostly string-based) data types that are present in DTDs. XML schema was originally proposed by Microsoft, but is now a W3C recommendation.
We think that very soon XML schemas will be used in Web applications as a replacement for DTDs. Here are the reasons why:
- XML schemas are easier to learn than DTD
- XML schemas are extensible to future additions
- XML schemas are richer and more useful than DTDs
- XML schemas are written in XML
- XML schemas support data types
- XML schemas support namespaces
XML Schema Instance Document
An XML schema instance document is an XML document that conforms to a particular schema.
Neither instances nor schemas need to exist as documents. They could be any of:
- Streams of bytes sent between applications
- Fields in a database record
- Collections of XML Infoset "Information Items"
Schema Elements and Subelements
- Each schema has a schema element and a variety of subelements
- Subelements determine appearance of elements and their content in instance documents
- Types of subelements are element, complexType, and simpleType
- Elements begin with xsd: to associate them with
XML schema namespace through declaration.
where the xsd prefix identifies elements as part of the XML schema language