SCDJWS Study Guide: JAXP


Printer-friendly version Printer-friendly version | Send this 
article to a friend Mail this to a friend


Previous Next vertical dots separating previous/next from contents/index/pdf Contents

Simple API for XML (SAX)

The Simple API for XML (SAX) is available with the JAXP; SAX is one of two common ways to write software that accesses XML data.  SAX is an event-driven methodology for XML processing and consists of many callbacks. Using SAX with JAXP allows developers to traverse through XML data sequentially, one element at a time, using a delegation even model. Each time elements of the XML structure are encountered, an event is triggered. Developers write event handlers to define customer processing for events they deem important. Each element is parsed down to its leaf node before moving on to the next sibling of that element in the XML document, therefore at no point is there any clear relation of what level of the tree we are at.

SAX

SAX is very useful for processing very large XML documents or streams, because all the XML data processed does not need to be kept in runtime memory. SAX is also very useful  for retrieving a specific value in a XML document and creating a subset of a XML document. It lacks randomly access or modifying the XML data capability, in such case, the Document Object Model (DOM) should be used.

The SAX API has provided the following handler interfaces:

  • Org.xml.sax.ContentHandler
  • Org.xml.sax.ErrorHandler
  • Org.xml.sax.DTDHandler
  • Org.xml.sax.EntityResolver

The SAX programmer implements one of the SAX interfaces that define event processing callbacks. SAX also provides a class called DefaultHandler (in the org.xml.sax.helpers package) that implements all of these callbacks and provides default, empty implementations of all the callback methods. The SAX developer needs only extend this class, then implement methods that require insertion of specific logic. So the key in SAX is to provide code for these various callbacks, then let a parser trigger each of them when appropriate. Here's the typical SAX routine:

  1. Create a SAXParser instance using a specific vendor's parser implementation.
  2. Create an event handler object and register the event handler object to the parser (the event handler implementation extends DefaultHandler, for example).
  3. Start parsing and sending each event to the handler.

JAXP's SAX component provides a simple means for doing all of this. Without JAXP, a SAX parser instance either must be instantiated directly from a vendor class (such as org.apache.xerces.parsers.SAXParser), or it must use a SAX helper class called XMLReaderFactory (also in the org.xml.sax.helpers package). The problem with the first methodology is obvious: It isn't vendor neutral. The problem with the second is that the factory requires, as an argument, the String name of the parser class to use (that Apache class, org.apache.xerces.parsers.SAXParser, again). You can change the parser by passing in a different parser class as a String. With this approach, if you change the parser name, you won't need to change any import statements, but you will still need to recompile the class. This is obviously not a best-case solution. It would be much easier to be able to change parsers without recompiling the class.

JAXP offers that better alternative: It lets you provide a parser as a Java system property. Of course, when you download a distribution from Sun, you get a JAXP implementation that uses Sun's version of Xerces.  The developer can move from one parser implementation to another through a system property rather than having to refer to it in the actual code. This means that the code does not need to be recompiled each time the parser implementation is changed.

An Abstract Factory of SAXParserFactory

The JAXP SAXParserFactory class is an obsolete class for building and configuring SAXParser objects in an implementation independent fashion. The concrete subclass to load is read from the javax.xml.parsers.SAXParserFactory java system property. This class has been replaced by the org,xml.sax.helpers.XMLReaderFactory class in SAX2. To obtain a SAXParser, the you must create a new instance of SAXParserFactory through its static SAXParserFactory.newInstance() method first and then the SAXParser itself by the factory newSAXParser() method.

Every factory's newInstance() method uses a specific algorithm for finding the JAXP implementation. Since JAXP 1.1.3 (also part of JDK 1.4), the factory find algorithm is the following:

  • Searches for a system property named after the appropriate factory

  • javax.xml.parsers.DocumentBuilderFactory
  • javax.xml.parsers.SAXParsersFactory
  • javax.xml.transform.TransformerFactory

  • Use the properties file "lib/jaxp.properties" in the JRE directory containing the fully qualified name for the implementation class with the key being the system property from above.
  • If this file is not found or it does not contain a property for the search factory, then the actual search logic follows (used also in the J2EE Engine):


  • Get a class loader by invoking Thread.getCurrentThread().getContextClassLoader()


  • Use this classloader to load a resource using Classloader.getResourceAsStream(factoryResource), where factoryResource is a file named after the properties above and located in META-INF/services/, for example META-INF/services/javax.xml.parsers.DocumentBuilderFactory


  • In case this resource is found, it is loaded and the contents of this file is a string specifying the class for the JAXP factory implementation


  • Then this factory is loaded using the same class loader

  • In case no such file is found, a fallback value is to be loaded. This fallback value is specific for the JDK or the JAXP interfaces provider (Use the platform default instance, such as SAXParserFactory).

In addition to the basic job of creating instances of SAX parsers, the factory lets you set configuration options. These options affect all parser instances obtained through the factory. The two most commonly used options available in JAXP 1.3 are to set namespace awareness with setNamespaceAware(boolean awareness), and to turn on DTD validation with setValidating(boolean validating). Remember that once these options are set, they affect all instances obtained from the factory after the method invocation.

Once you have set up the factory, invoking newSAXParser() returns a ready-to-use instance of the JAXP SAXParser class. This class wraps an underlying SAX parser (an instance of the SAX class org.xml.sax.XMLReader). It also protects you from using any vendor-specific additions to the parser class. This class allows actual parsing behavior to be kicked off.

The example shows how you can create, configure, and use a SAX factory:

import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

// SAX
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSAXParsing {
    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.err.println ("Usage: java TestSAXParsing [filename]");
                System.exit (1);
            }

            SAXParserFactory factory = SAXParserFactory.newInstance();
            // Turn on validation, and turn off namespaces
            factory.setValidating(true);
            factory.setNamespaceAware(false);
            SAXParser parser = factory.newSAXParser();
            parser.parse(new File(args[0]), new MyHandler());
        } catch (ParserConfigurationException e) {
            System.out.println("The underlying parser does not support " +
                               " the requested features.");
        } catch (FactoryConfigurationError e) {
            System.out.println("Error occurred obtaining SAX Parser Factory.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

class MyHandler extends DefaultHandler {
    // SAX callback implementations from ContentHandler, ErrorHandler, etc.
}

In the example, you can see that two JAXP-specific problems can occur in using the factory: the inability to obtain or configure a SAX factory, and the inability to configure a SAX parser. The first of these problems, represented by a FactoryConfigurationError, usually occurs when the parser specified in a JAXP implementation or system property cannot be obtained. The second problem, represented by a ParserConfigurationException, occurs when a requested feature is not available in the parser being used. Both are easy to deal with and shouldn't pose any difficulty when using JAXP. In fact, you might want to write code that attempts to set several features and gracefully handles situations where a certain feature isn't available.

A SAXParser instance is obtained once you get the factory, turn off namespace support, and turn on validation; then parsing begins. The SAX parser's parse() method takes an instance of the SAX HandlerBase helper class that I mentioned earlier, which your custom handler class extends. See the code distribution to view the implementation of this class with the complete Java listing (see Download). You also pass in the File to parse. However, the SAXParser class contains much more than this single method.

SAXParser Class

Once you have an instance of the SAXParser class, you can do a lot more than just pass it a File to parse. Because of the way components in large applications communicate, it's not always safe to assume that the creator of an object instance is its user. One component might create the SAXParser instance, while another component (perhaps coded by another developer) might need to use that same instance. For this reason, JAXP provides methods to determine the parser's settings. For example, you can use isValidating() to determine if the parser will -- or will not -- perform validation, and isNamespaceAware() to see if the parser can process namespaces in an XML document. These methods can give you information about what the parser can do, but users with just a SAXParser instance -- and not the SAXParserFactory itself -- do not have the means to change these features. You must do this at the parser factory level.

You also have a variety of ways to request parsing of a document. Instead of just accepting a File and a SAX DefaultHandler instance, the SAXParser's parse() method can also accept a SAX InputSource, a Java InputStream, or a URL in String form, all with a DefaultHandler instance. So you can still parse documents wrapped in various forms.

Finally, you can obtain the underlying SAX parser (an instance of org.xml.sax.XMLReader) and use it directly through the SAXParser's getXMLReader() method. Once you get this underlying instance, the usual SAX methods are available. Listing 2 shows examples of the various uses of the SAXParser class, the core class in JAXP for SAX parsing:

// Get a SAX Parser instance
SAXParser saxParser = saxFactory.newSAXParser();
// Find out if validation is supported
boolean isValidating = saxParser.isValidating();
// Find out if namespaces are supported
boolean isNamespaceAware = saxParser.isNamespaceAware();
// Parse, in a variety of ways
// Use a file and a SAX DefaultHandler instance
saxParser.parse(new File(args[0]), myDefaultHandlerInstance);
// Use a SAX InputSource and a SAX DefaultHandler instance
saxParser.parse(mySaxInputSource, myDefaultHandlerInstance);
// Use an InputStream and a SAX DefaultHandler instance
saxParser.parse(myInputStream, myDefaultHandlerInstance);
// Use a URI and a SAX DefaultHandler instance
saxParser.parse("http://www.newInstance.com/xml/doc.xml",
                myDefaultHandlerInstance);
// Get the underlying (wrapped) SAX parser
org.xml.sax.XMLReader parser = saxParser.getXMLReader();
// Use the underlying parser
parser.setContentHandler(myContentHandlerInstance);
parser.setErrorHandler(myErrorHandlerInstance);
parser.parse(new org.xml.sax.InputSource(args[0]));

JAXP's added functionality is fairly minor, especially where SAX is involved. This minimal functionality makes your code more portable and lets other developers use it, either freely or commercially, with any SAX-compliant XML parser. That's it. There's nothing more to using SAX with JAXP. If you already know SAX, you're about 98 percent of the way there. You just need to learn two new classes and a couple of Java exceptions, and you're ready to roll. If you've never used SAX, it's easy enough to start now.

SAX Plugability

The SAX Plugability classes allow an application programmer to provide an implementation of the org.xml.sax.DefaultHandler API to a SAXParser implementation and parse XML documents. As the parser processes the XML document, it will call methods on the provided DefaultHandler.

After having obtained parser which is an XMLReader, you plug in the event handlers you need:

  • parser.setContentHandler
  • parser.setErrorHandler()
  • parser.setDTDHandler()
  • parser.setEntityResolver()

The class org.xml.sax.helpers.DefaultHandler is a convenience class that implements the org.xml.sax.ContentHandler interface (plus the org.xml.sax.DTDHandler, org.xml.sax.ErrorHandler, and org.xml.sax.EntityResolver interfaces) with empty methods.

Feactures

(See the SAX-standardized features and properties list at http://www.saxproject.org/?selected=get-set)

Once you have an instance of your parser, you need to configure it. Note that this isn't the same as setting up the parser to deal with errors, content, or structures in XML; instead, configuration is the process of actually telling the parser how to behave. You may turn on validation, turn off namespace checking, and expand entities. These behaviors are totally independent of a specific XML document, and therefore involve interaction with your new parser instance.

Note: For those of you who are overly anxious (I know you're out there), I will indeed be dealing with content, error handling, and the like. However, those subjects will be addressed in future tips, so you'll have to check back. For now, just concentrate on configuration, features, and properties.

You can configure parsers in two ways: features and properties. Features involve turning on or off a specific piece of functionality, like validation. Properties involve setting the value of a specific item that the parser uses, like the location of a schema to validate all documents against. I'll deal with features first, and then look at properties in the next section.

Features are set, not surprisingly, through a method on your parser called setFeature(). The syntax looks like that in Listing 2.

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();

String featureName = "some feature URI";
boolean featureOn = true;

try {
  parser.setFeature(featureName, featureOn);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

This is pretty self-explanatory; the key is knowing the common features available to SAX parsers. Each feature is identified by a specific URI. A complete list of these URIs is available online at the SAX Web site (see Resources). Some of the most common features are validation and namespace processing. Listing 3 shows an example of setting both of these properties.

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();

try {
  // Turn on validation
  parser.setFeature("http://xml.org/sax/features/validation", true);
  // Ensure namespace processing is on (the default)
  parser.setFeature("http://xml.org/sax/features/namespaces", true);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

Note that while parsers have several standard SAX features, they are free to add their own vendor-specific features. For example, Apache Xerces-J adds features that allow for dynamic validation and the continuance of processing after encountering a fatal error. Consult your parser vendor's documentation for the relevant feature URIs.

Properties

Once you understand features, making sense of properties is easy. They behave in exactly the same manner, except that properties take an object as an argument where features take in a boolean value. You use the setProperty() method for this purpose, as shown in Listing 4.

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();

String propertyName = "some property URI";

try {
  parser.setProperty(propertyName, obj-arg);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown property specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported property specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting property: " + e.getMessage());
}

The same error-handling framework is in play here, so you can easily duplicate code between the two types of configuration options. As with features, SAX provides a standard set of properties, and vendors can add their own extensions. Common SAX-standard properties allow for setting a Lexical Handler and a Declaration Handler (two handlers I'll discuss in later tips). Parsers like Apache Xerces extend these with, for example, the ability to set the input buffer size and the location of an external schema to use in validation. Listing 5 shows a few properties in action.

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();

try {
  // Set the chunk to read in by SAX
  parser.setProperty("http://apache.org/xml/properties/input-buffer-size",
      new Integer(2048));
  // Set a LexicalHandler
  parser.setProperty("http://xml.org/sax/properties/lexical-handler",
      new MyLexicalHandler());
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

With an understanding of features and properties, you can make your parser do almost anything. Once you understand setting up your parser in this fashion, you're ready for my next tip, which will discuss building a basic content handler.


Previous Next vertical dots separating previous/next from contents/index/pdf Contents

  |   |