SCDJWS Study Guide: XML Basic


Printer-friendly version Printer-friendly version | Send this 
article to a friend Mail this to a friend


Previous Next vertical dots separating previous/next from contents/index/pdf Contents

XML CDATA

Some of the characters have special meanings in XML. If an element contains these characters, the parser will be confused. CDATA sections are used to escape blocks of text containing these characters, such as "<", ">", "&" and so on. A CDATA section marks the text as literal so that it will not be parsed, instead be considered just a string of characters. CDATA sections can appear inside element content to allow the special characters to appear.

A CDATA section begins with the character sequence "<![CDATA["and ends with the character sequence "]]>". The text in between the "<![CDATA[" and the "]]>" are escaped.

The following is an example of using CDATA to include HTML content inside an XML element without changing the HTML "<" and ">" to < and >. Since the element is included inside a CDATA section, the content is not parsed by XML parser.

<? xml version="1.0" encoding="UTF-8" ?>
<book>
<title>Little Sister, Big Sister</title>
<author>Diana Cain Blutherthal</author>
<category>chapter book</category>
<TOC>
  <![CDATA[
    <htlm>
      <body>
        <table>
          <tr><td>Chapter One</td><td>Queen</td><td>....3</td></tr>
          <tr><td>Chapter Two</td><td>Mermaids</td><td>....20</td></tr>
          <tr><td>Chapter Three</td><td>The Chocolate Bar</td><td>....31</td></tr>
          <tr><td>Chapter Four</td><td>Thunder Cookies</td><td>....40</td></tr>
        </table>
      </body>
    <html>
  ]]>
</TOC>
</book>


All tags and entity references inside these CDATA markers are ignored by the XML parser that treats them just like any character data. In another words, an XML parser ignores all markup characters such as <, >, and & between these CDATA markers. The only markup an XML parser recognizes inside a CDATA section is the closing character sequence �]]>". The character string "]]>� must not appear inside a CDATA block as it would signal the end of the CDATA section. Instead, the closing greater-than character must be escaped using the appropriate entity &gt;. Therefore, CDATA sections cannot be nested.

Keep in mind, though, that nothing inside a CDATA Section is parsed. Therefore, if you were to include entities, they would not be parsed. So, &lt;I&gt; would remain &lt;I&gt; if it were contained inside a CDATA section.

A CDATA Section can be used anywhere PCDATA occurs�as element content, and so on. However, attribute values are always parsed unless they are specified as CDATA in a DTD or Schema. So, you cannot include a CDATA Section in an attribute value.

Comments are not recognized in a CDATA section. The XML parser will treat any "<! � comments - >" in the CDATA block as the literal text without parsing them. CDATA does not work in HEML.



Previous Next vertical dots separating previous/next from contents/index/pdf Contents

  |   |