XML Parser
XML Parser
XML Parser
Parsing XML
• Goal: read XML files into data structures in
programming languages
• Possible strategies
– Parse by hand with some reusable libraries
– Parse into generic tree structure
– Parse as sequence of events
– Automagically parse to language-specific objects
Parsing by-hand
• Advantages
– Complete control
– Good if simple needs – build off of regex package
• Disadvantages
– Must write the initial code yourself, even if it becomes
generalized
– Pretty tedious and error prone.
– Gets very hard when using schema or DTD to validate
– No one does this anymore
Parsing into generic tree structure
• Advantages
– Industry-wide, language neutral W3C standard exists called DOM
(Document Object Model)
– Learning DOM for one language makes it easy to learn for any
other
– As of JAXP 1.2, support for Schema
– Have to write much less code to get XML to something you want
to manipulate in your program
• Disadvantages
– Non-intuitive API, doesn’t take full advantage of Java
– Still quite a bit of work
What is JAXP?
• JAXP: Java API for XML Processing
– In the Java language, the definition of these standard
API’s (together with XSLT API) comprise a set of
interfaces known as JAXP
– Java also provides standard implementations together
with vendor pluggability layer
– Some of these come standard with J2SDK, others are
only availdable with Web Services Developers Pack
– We will study these shortly
Another alternative
• JDOM: Native Java published API for
representing XML as tree
• Like DOM but much more Java-specific,
object oriented
• However, not supported by other languages
• Also, no support for schema
• Dom4j another alternative
JAXB
• JAXB: Java API for XML Bindings
org.w3d.dom.Document
Sample Code
A factory instance
DocumentBuilderFactor factory = is the parser implementation.
Can be changed with runtime
DocumentBuilderFactory.newInstance();System property. Jdk has default.
Xerces much better.
/* set some factory options here */
From the factory one obtains
DocumentBuilder builder = an instance of the parser
factory.newDocumentBuilder();
xmlFile can be an java.io.File,
Document doc = builder.parse(xmlFile); an inputstream, etc.
javax.xml.parsers.DocumentBuilderFactory
For reference. Notice that the
javax.xml.parsers.DocumentBuilder
Document class comes from the
org.w3c.dom.Document w3c-specified bindings.
Validation
• Note that by default the parser will not
validate against a schema or DTD
Each of these has a special and non-obvious associated type, value, and name.
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
}
Creating a DOM
• Sometimes you may want to create a DOM
tree directly in memory. This is done with:
DocumentBuilderFactory factory
= DocumentBuilderFactory.newInstance();
DocumentBuilder builder
= factory.newDocumentBuilder();
document = builder.newDocument();
Manipulating Nodes
• Once the root node is obtained, typical tree
methods exist to manipulate other elements:
boolean node.hasChildNodes()
NodeList node.getChildNodes()
Node node.getNextSibling()
Node node.getParentNode()
String node.getValue();
String node.getName();
String node.getText();
void setNodeValue(String nodeValue);
Node insertBefore(Node new, Node ref);
JDOM
JDOM Motivation
In JDOM:
Element element = new Element("fibonacci");
In DOM:
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
DOMImplementation impl = builder.getDOMImplementation();
Document doc = impl.createDocument( null, "Fibonacci_Numbers", null);
In JDOM:
Element element = doc.createElement("fibonacci");
Element element = new Element("fibonacci");
element.setText("8"); :
element.setAttribute("index", "6");
• Note that this creates a huge number of files that together represent the
content of the books.xsd schema as a set of Java classes
• It is not necessary to know all of these classes. We’ll study them only
at a high level so we can understand how to use them
Example: students.xsd
Generated interfaces
• xjc.sh -p test.lottery students.xsd
• Summary of examples:
– student/
• Use JAXB to read an xml document composed of a single student
complex type
– student/
• Same, but for an xml document composed of a sequence of such
student types of indefinite length
– purchaseOrder/
• Another read example, but for a more complex schema
Sample programs, cont
• Course examples, cont
– create-marshal
• Purchase-order example modified to create in memory and
write to XML
– modify-marshal
• Purchase-order example modified to read XML, change it and
write back to XML
• Bind model group with a repeating occurrence and complex type definitions with mixed {content type} to:
– A general content property; a List content-property that holds Java instances representing element information items and character
data items.
End