Introduction To XML: The Two Problems
Introduction To XML: The Two Problems
Introduction
XML is a new type of language which has been developed for the web which is different to any other type of scripting or programming
language available before. Instead of being concerned with the processing and display of data, XML's primary purpose is to tell the
computer what data entered actually means.
1. Computers do not understand the information placed in them.. For example there is no way for a search engine, or any other
computer,
to know that this is page contains the introduction part of an XML tutorial. All it is is a collection of letters and numbers, with
HTML formatting around it. The computer cannot even tell what on this page is a heading, what is text and what is an advert.
This is the main problem which XML was designed to overcome. If a page or document is written in XML, a computer can
understand exactly what it is about. As will probably be obvious, this has very major implications for search engine technology.
If a search engine knew exactly what was on a page, it would be able to instantly provide the exact results a person was looking
for, with no inaccurate matches and no half-relevant pages. This is just the revolution the over-bloated web needs.
2. Web pages are not compatible across different devices. One of the major difficulties that web designers have today is that
people are now accessing the pages from a variety of different devices. PCs, Macs, mobile phones, palmtop computers and even
televisions. Because of this, web designers must now either produce their pages in several different formats to cope with this, or
they must cut back on the design in order to have the page compatible across the different formats. Because XML is used to
define what data means and not how it is displayed, it makes it very easy to use the same data on several different platforms.
What Is XML?
So what actually is XML? The thing about it which people find the most difficult to understand is that XML does not actually do anything.
XML is not a way to design your home page and it won't change the way in which you build sites. This has made many people believe
that XML is useless, as they can't see a way that it will benefit them. XML has a wide variety of benefits though, two of which were
outlined above.
The real use of XML, though, is to describe data. It is used, in a similar way in which HTML is, except for the fact that there is a major
difference between the two:
The Language
As mentioned above, XML looks, and is structured very similarly to HTML. They both use the system where tags are used to enclose the
data they refer to. They both can use nested tags and both can also have attributes added to their tags.
The most revolutionary thing about XML, though is that you are not restricted to just using the normal, pre-defined tags like font and br.
Instead you are responsible for making up the tags yourself. You can name them anything you like and can use them to represent
anything you like. This is a feature which cannot be found in any other scripting language on the web.
Is It Difficult To Learn?
The answer to this, in short, is no. The only thing you have to learn about XML is how to structure your tags, and they are in fact almost
identical to HTML tags. Most of it is just logical thinking. Before learning XML it is important that you already know HTML. It is also useful
if you know a web scripting language such as PHP, ASP or JavaScript. If you do not yet know these try some of the tutorials on the site. If
you are looking to be able to format a web page, not describe data, you will be better of learning XHTML, the new standard replacing H
TML.
What is xml
o Xml (eXtensible Markup Language) is a markup language.
o XML is designed to store and transport data.
o Xml was released in late 90’s. it was created to provide an easy to use and store self describing data.
o XML became a W3C Recommendation on February 10, 1998.
o XML is not a replacement for HTML.
o XML is designed to be self-descriptive.
o XML is designed to carry data, not to display data.
o XML tags are not predefined. You must define your own tags.
o XML is platform independent and language independent.
XML - Overview
XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language
(SGML).
XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are
used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many
successful features of HTML.
There are three important characteristics of XML that make it useful in a variety of systems and solutions:
XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits your application.
XML carries the data, does not present it: XML allows you to store the data irrespective of how it will be presented.
XML is a public standard: XML was developed by an organization called the World Wide Web Consortium (W3C) and is available
as an open standard.
What is Markup?
XML is a markup language that defines set of rules for encoding documents in a format that is both human-readable and machine-
readable. So what exactly is a markup language? Markup is information added to a document that enhances its meaning in certain ways,
in that it identifies the parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed
in the text of a document to demarcate and label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of text:
<message>
<text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as <message>...</message> and <text>... </text>. The tags <message> and
</message> mark the start and the end of the XML code fragment. The tags <text> and </text> surround the text Hello, world!.
Why xml
Platform Independent and Language Independent: The main benefit of xml is that you can use it to take data from a program like Microsoft
SQL, convert it into XML then share that XML with other programs and platforms. You can communicate between two platforms which are
generally very difficult. The main thing which makes XML truly powerful is its international acceptance. Many corporation use XML
interfaces for databases, programming, office application mobile phones and more. It is due to its platform independent feature.
6) XSD XML schema It is an XML based alternative to dtd. It is used to describe the
definition structure of an XML document.
7) XLink XML linking xlink stands for XML linking language. This is a language for creating
language hyperlinks (external and internal links) in XML documents.
8) XPointer XML pointer It is a system for addressing components of XML based internet
language media. It allows the xlink hyperlinks to point to more specific parts in
the XML document.
9) SOAP Simple object It is an acronym stands simple object access protocol. It is XML based
access protocol protocol to let applications exchange information over http. in simple
words you can say that it is protocol used for accessing web services.
10) WSDL web services It is an XML based language to describe web services. It also describes
description the functionality offered by a web service.
languages
11) RDF Resource RDF is an XML based language to describe web resources. It is a
description standard model for data interchange on the web. It is used to describe
framework the title, author, content and copyright information of a web page.
12) SVG Scalable vector It is an XML based vector image format for two-dimensional images.
graphics It defines graphics in XML format. It also supports animation.
13) RSS Really simple RSS is a XML-based format to handle web content syndication. It is
syndication used for fast browsing for news and updates. It is generally used for
news like sites.
XML - Syntax
This chapter takes you through the simple syntax rules to write an XML document. Following is a complete XML document:
<?xml version="1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
You can notice there are two kinds of information in the above example:
markup, like <contact-info> and
the text, or the character data, Tutorials Point and (040) 123-4567.
The following diagram depicts the syntax rules to write different types of markup and text in an XML document.
XML Declaration
The XML document can optionally have an XML declaration. It is written as below:
In XML all elements must be properly nested within each other like this
Entity References: An entity reference contains a name between the start and the end delimiters. For
example & where amp is name. The namerefers to a predefined string of text and/or markup.
Character References: These contain references, such as A, contains a hash mark (“#”) followed by a number. The number always
refers to the Unicode code of a character. In this case, 65 refers to alphabet "A".
XML Text
The names of XML-elements and XML-attributes are case-sensitive, which means the name of start and end elements need to
be written in the same case.
To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the XML-attributes will be
ignored.
Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use them, some replacement-
entities are used, which are listed below:
not allowed character replacement-entity character description
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
XML - Documents
An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An
XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a
mathematical equation.
XML Document example
A simple document is given in the following example:
<?xml version="1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
The following image depicts the parts of XML document.
XML - Declaration
This chapter covers XML declaration in detail. XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in first line of the XML document.
Syntax
Following syntax shows XML declaration:
<?xml
version="version_number"
encoding="encoding_declaration"
standalone="standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter value inside a quote. Following table shows the above
syntax in detail:
Parameter Parameter_value Parameter_description
Version 1.0 Specifies the version of the XML standard used.
Encoding UTF-8, UTF-16, ISO-10646- It defines the character encoding used in the document.
UCS-2, ISO-10646-UCS-4, UTF-8 is the default encoding used.
ISO-8859-1 to ISO-8859-9,
ISO-2022-JP, Shift_JIS, EUC-
JP
Standalone yes or no. It informs the parser whether the document relies on the
information from an external source, such as external
document type definition (DTD), for its content. The
default value is set to no. Setting it to yes tells the
processor there are no external declarations required for
parsing the document.
Rules
An XML declaration should abide with the following rules:
If the XML declaration is present in the XML, it must be placed as the first line in the XML document.
If the XML declaration is included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct order is:version, encoding and standalone.
Either single or double quotes may be used.
The XML declaration has no closing tag i.e. </?xml>
XML Declaration Examples
Following are few examples of XML declarations:
XML declaration with no parameters:
<?xml >
XML declaration with version definition:
<?xml version="1.0">
XML declaration with all parameters defined:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
XML declaration with all parameters defined in single quotes:
<?xml version='1.0' encoding='iso-8859-1' standalone='no' ?>
An XML document contains XML Elements.
XML Tree
XML documents form a tree structure that starts at "the root" and branches to "the leaves".
Self-Describing Syntax
XML uses a much self-describing syntax.
A prolog defines the XML version and the character encoding:
<?xml version="1.0" encoding="UTF-8"?>
The <book> elements have 4 child elements: <title>,< author>, <year>, <price>.
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
XML Element
XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media
objects or all of these.
Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or for empty elements,
by an empty-element tag.
Syntax
Following is the syntax to write an XML element:
<element-name attribute1 attribute2>
....content
</element-name>
where
element-name is the name of the element. The name its case in the start and end tags must match.
attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property of the element.
It associates a name with a value, which is a string of characters. An attribute is written as:
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
Empty Element
An empty element (element with no content) has following syntax:
<name attribute1 attribute2.../>
Example of an XML document using various XML element:
<?xml version="1.0"?>
<contact-info>
<address category="residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
<address/>
</contact-info>
An element with no content is said to be empty.
Naming Styles
There are no naming styles defined for XML elements. But here are some commonly used:
Style Example Description
Lower case <firstname> All letters lower case
Upper case <FIRSTNAME> All letters upper case
Underscore <first_name> Underscore separates words
Pascal case <FirstName> Uppercase first letter in each word
Camel case <firstName> Uppercase first letter in each word except the first
If you choose a naming style, it is good to be consistent!
XML documents often have a corresponding database. A common practice is to use the naming rules of the database for the XML
elements.
Camel case is a common naming rule in JavaScripts.
XML - Attributes
This chapter describes about the XML attributes. Attributes are part of the XML elements. An element can have multiple unique attributes.
Attribute gives more information about XML elements. To be more precise, they define properties of elements. An XML attribute is always
a name-value pair.
Syntax
An XML attribute has following syntax:
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form:
name = "value"
value has to be in double (" ") or single (' ') quotes. Here, attribute1 andattribute2 are unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise associate it with
some string of data. Following example demonstrates the use of attributes:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category="flowers" />
<plants category="shrubs">
</plants>
</garden>
Attributes are used to distinguish among elements of the same name. When you do not want to create a new element for every situation.
Hence, use of an attribute can add a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attributecategory and assigning different values to each of the
elements. Hence we have two categories of plants, one flowers and other color. Hence we have two plant elements with different
attributes.
You can also observe that we have declared this attribute at the beginning of the XML.
Attribute Types
Following table lists the type of attributes:
Attribute Type Description
StringType It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means,
any string of non-markup characters is a legal part of the attribute.
TokenizedType This is more constrained type. The validity constraints noted in the grammar are applied after
the attribute value is normalized. The TokenizedType attributes are given as:
ID : It is used to specify the element as unique.
IDREF : It is used to reference an ID that has been named for another element.
IDREFS : It is used to reference all IDs of an element.
ENTITY : It indicates that the attribute will represent an external entity in the
document.
ENTITIES : It indicates that the attribute will represent external entities in the
document.
NMTOKEN : It is similar to CDATA with restrictions on what data can be part of the
attribute.
NMTOKENS : It is similar to CDATA with restrictions on what data can be part of the
attribute.
EnumeratedType This has a list of predefined values in its declaration. out of which, it must assign one value.
There are two types of enumerated attribute:
NotationType : It declares that an element will be referenced to a NOTATION
declared somewhere else in the XML document.
Enumeration : Enumeration allows you to define a specific list of values that the
attribute value must match.
Rules always have exceptions. My rule about not using attributes has one too:
Sometimes I assign ID references to elements in my XML documents. These ID references can be used to access XML element in much
the same way as the NAME or ID attributes in HTML. This example demonstrates this:
<?xml version="1.0"?>
<messages>
<note ID="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note ID="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>
The ID in these examples is just a counter, or a unique identifier, to identify the different notes in the XML file.
XML - Comments
This chapter explains how comments work in XML documents. XML comments are similar to HTML comments. The comments are added
as notes or lines for understanding the purpose of an XML code.
Comments can be used to include related links, information and terms. They are visible only in the source code; not in the XML code.
Comments may appear anywhere in XML code.
Syntax
XML comment has following syntax:
<!-------Your comment----->
A comment starts with <!-- and ends with -->. You can add textual notes as comments between the characters. You must not nest one
comment inside the other.
Example
Following example demonstrates the use of comments in XML document:
<?xml version="1.0" encoding="UTF-8" ?>
<!---Students grades are uploaded by months---->
<class_list>
<student>
<name>Tanmay</name>
<grade>A</grade>
</student>
</class_list>
Any text between <!-- and --> characters is considered as a comment.
CDATA Rules
The given rules are required to be followed for XML CDATA:
CDATA cannot contain the string "]]>" anywhere in the XML document.
Nesting is not allowed in CDATA section.
XML - White Spaces
This chapter discusses white space handling in XML documents. Whitespace is a collection of spaces, tabs, and newlines. They are
generally used to make a document more readable.
XML document contain two types of white spaces (a) Significant Whitespace and (b) Insignificant Whitespace. Both are explained below
with examples.
Significant Whitespace
A significant Whitespace occurs within the element which contain text and markup present together. For example:
<name>TanmayPatil</name>
and
<name>Tanmay Patil</name>
The above two elements are different because of the space between Tanmayand Patil. Any program reading this element in an XML file
is obliged to maintain the distinction.
Insignificant Whitespace
Insignificant whitespace means the space where only element content is allowed. For example:
<address.category="residence">
or
<address....category="..residence">
The above two examples are same. Here, the space is represented by dots (.). In the above example, the space
between address and category is insignificant.
A special attribute named xml:space may be attached to an element. This indicates that whitespace should not be removed for that
element by the application. You can set this attribute to default or preserve as shown in the example below:
<!ATTLIST address xml:space (default|preserve) 'preserve'>
Where:
The value default signals that the default whitespace processing modes of an application are acceptable for this element;
The value preserve indicates the application to preserve all the whitespaces.
XML – Encoding
Encoding is the process of converting unicode characters into their equivalent binary representation. When the XML processor reads an
XML document, it encodes the document depending on the type of encoding. Hence, we need to specify the type of encoding in the XML
declaration.
Encoding Types
There are mainly two types of encoding:
UTF-8
UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of
bits used to represent a character. They are either 8(one byte) or 16(two bytes). For the documents without encoding information, UTF-
8 is set by default.
Syntax
Encoding type is included in the prolog section of the XML document. The syntax for UTF-8 encoding is as below:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
Syntax for UTF-16 encoding
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>
Example
Following example shows declaration of encoding:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
In the above example encoding="UTF-8", specifies that 8-bits are used to represent the characters. To represent 16-bit characters, UTF-
16 encoding can be used.
The XML files encoded with UTF-8 tend to be smaller in size than those encoded with UTF-16 format.
XML – Namespaces
A Namespace is a set of unique names. Namespace is a mechanisms by which element and attribute name can be assigned to group. The
Namespace is identified by URI(Uniform Resource Identifiers).
Namespace Declaration
A Namspace is declared using reserved attributes. Such an attribute name must either be xmlns or begin with xmlns: shown as below:
<element xmlns:name="URL">
Syntax
The Namespace starts with the keyword xmlns.
The word name is the Namespace prefix.
The URL is the Namespace identifier.
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different
XML applications.
This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both contain a <table> element, but the elements have
different content and meaning.
A user or an XML application will not know how to handle these differences.
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
In the example above:
The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.
Namespaces can also be declared in the XML root element:
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose of using an URI is to give the namespace a unique name.
However, companies often use the namespace as a pointer to a web page containing namespace information.
Default Namespaces
Defining a default namespace for an element saves us from using prefixes in all the child elements. It has the following syntax:
xmlns="namespaceURI"
This XML carries HTML table information:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a piece of furniture:
<table xmlns="http://www.w3schools.com/furniture">
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
XML Validation
XML DTD
A DTD defines the legal elements of an XML document. In simple words we can say that a DTD defines the document structure with a list
of legal elements and attributes. XML schema is a XML based alternative to DTD. Actually DTD and XML schema both are used to form a
well formed XML document. We should avoid errors in XML documents because they will stop the XML programs.
XML schema
It is defined as an XML language. Uses namespaces to allow for reuses of existing definitions. It supports a large number of built in data
types and definition of derived data types.
XML – DTDs
The XML Document Type Declaration, commonly known as DTD, is a way to describe XML language precisely. DTDs check vocabulary and
validity of the structure of XML documents against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a separate document and then liked separately.
Syntax
Basic syntax of a DTD is as follows:
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
The DTD starts with <!DOCTYPE delimiter.
An element tells the parser to parse the document from the specified root element.
DTD identifier is an identifier for the document type definition, which may be the path to a file on the system or URL to a file on
the internet. If the DTD is pointing to external path, it is called External Subset.
The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.
Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it as internal DTD, standalone attribute in
XML declaration must be set to yes. This means, the declaration works independent of external source.
Syntax
The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you declare the elements.
Example
Following is a simple example of internal DTD:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Let us go through the above code:
Start Declaration- Begin the XML declaration with following statement
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
DTD- Immediately after the XML header, the document type declarationfollows, commonly referred to as the DOCTYPE:
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The DOCTYPE informs the parser that a DTD is
associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you declare elements, attributes, entities, and notations:
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document. <!ELEMENT name (#PCDATA)> defines the
element nameto be of type "#PCDATA". Here #PCDATA means parse-able text data.
End Declaration - Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This
effectively ends the definition, and thereafter, the XML document follows immediately.
Rules
The document type declaration must appear at the start of the document (preceded only by the XML header) — it is not permitted
anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations must start with an exclamation mark.
The Name in the document type declaration must match the element type of the root element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either
the legal .dtd file or a valid URL. To refer it as external DTD, standalone attribute in the XML declaration must be set as no. This means,
declaration includes information from the external source.
Syntax
Following is the syntax for external DTD:
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
Example
The following example shows external DTD usage:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd are as shown:
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types
You can refer to an external DTD by using either system identifiers or public identifiers.
SYSTEM IDENTIFIERS
A system identifier enables you to specify the location of an external file containing DTD declarations. Syntax is as follows:
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the document.
PUBLIC IDENTIFIERS
Public identifiers provide a mechanism to locate DTD resources and are written as below:
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a
catalog. Public identifiers can follow any format, however, a commonly used format is calledFormal Public Identifiers, or FPIs.
DTD – Elements
Declaring an Element
In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax:
<!ELEMENT element-name (element-content)>
Empty elements
Empty elements are declared with the keyword EMPTY inside the parentheses:
<!ELEMENT element-name (EMPTY)>
example:
<!ELEMENT img (EMPTY)>
#CDATA means the element contains character data that is not supposed to be parsed by a parser.
#PCDATA means that the element contains data that IS going to be parsed by a parser.
The keyword ANY declares an element with any content.
If a #PCDATA section contains elements, these elements must also be declared.
When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a
full declaration, the children must also be declared, and the children can also have children. The full declaration of the note document
will be:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
Wrapping
If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
As you can see from the syntax above, the ATTLIST declaration defines the element which can have the attribute, the name of the
attribute, the type of the attribute, and the default attribute value.
The attribute-type can have the following values:
Value Explanation
CDATA The value is character data
(eval|eval|..) The value must be an enumerated value
ID The value is an unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml: The value is predefined
The attribute-default-value can have the following values:
Value Explanation
#DEFAULT value The attribute has a default value
#REQUIRED The attribute value must be included in the element
#IMPLIED The attribute does not have to be included
#FIXED value The attribute value is fixed
Implied attribute
Syntax:
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD example:
<!ATTLIST contact fax CDATA #IMPLIED>
XML example:
<contact fax="555-667788">
Use an implied attribute if you don't want to force the author to include an attribute and you don't have an option for a default value
either.
Required attribute
Syntax:
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
DTD example:
<!ATTLIST person number CDATA #REQUIRED>
XML example:
<person number="5677">
Use a required attribute if you don't have an option for a default value, but still want to force the attribute to be present.
Fixed attribute value
Syntax:
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
DTD example:
<!ATTLIST sender company CDATA #FIXED "Microsoft">
XML example:
<sender company="Microsoft">
Use a fixed attribute value when you want an attribute to have a fixed value without allowing the author to change it. If an author
includes another value, the XML parser will return an error.
Enumerated attribute values
Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
DTD example:
<!ATTLIST payment type (check|cash) "cash">
XML example:
<payment type="check">
or
<payment type="cash">
Use enumerated attribute values when you want the attribute values to be one of a fixed set of legal values.
DTD - Entities
Entities
Entities as variables used to define shortcuts to common text.
Entity references are references to entities.
Entities can be declared internal.
Entities can be declared external
Internal Entity Declaration
Syntax:
<!ENTITY entity-name "entity-value">
DTD Example:
<!ENTITY writer "Jan Egil Refsnes.">
<!ENTITY copyright "Copyright XML101.">
XML example:
<author>&writer;©right;</author>
External Entity Declaration
Syntax:
<!ENTITY entity-name SYSTEM "URI/URL">
DTD Example:
<!ENTITY writer SYSTEM "http://www.xml101.com/entities/entities.xml">
<!ENTITY copyright SYSTEM "http://www.xml101.com/entities/entities.dtd">
XML example:
<author>&writer;©right;</author>
DTD Validation
XML - Schemas
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of
XML data. XML schema defines the elements, attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
Syntax
You need to declare a schema in your XML document as follows:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
Example
The following example shows how to use schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an XML document can take.
Elements
As we saw in the XML - Elements chapter, elements are the building blocks of XML document. An element can be defined within an XSD
as follows:
<xs:element name="x" type="y"/>
Definition Types
You can define XML schema elements in following ways:
Simple Type - Simple type element is used only in the context of the text. Some of predefined simple types are: xs:integer, xs:boolean,
xs:string, xs:date. For example:
<xs:element name="phone_number" type="xs:int" />
Complex Type - A complex type is a container for other element definitions. This allows you to specify which child elements an element
can contain and to provide some structure within your XML documents. For example:
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, Address element consists of child elements. This is a container for other <xs:element> definitions, that allows to
build a simple hierarchy of elements in the XML document.
Global Types - With global type, you can define a single type in your document, which can be used by all other references. For example,
suppose you want to generalize the person and company for different addresses of the company. In such case, you can define a general
type as below:
<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as below:
<xs:element name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once forAddress1 and once for Address2), we now have a single definition.
This makes maintenance simpler, i.e., if you decide to add "Postcode" elements to the address, you need to add them at just one place.
Attributes
Attributes in XSD provide extra information within an element. Attributes havename and type property as shown below:
<xs:attribute name="x" type="y"/>
XML Schemas are More Powerful than DTD
XML Schemas are written in XML
XML Schemas are extensible to additions
XML Schemas support data types
XML Schemas support namespaces
DTD vs XSD
There are many differences between DTD (Document Type Definition) and XSD (XML Schema Definition). In short, DTD provides less control
on XML structure whereas XSD (XML schema) provides more control.
The important differences are given below:
No. DTD XSD
1) DTD stands for Document Type Definition. XSD stands for XML Schema Definition.
2) DTDs are derived from SGML syntax. XSDs are written in XML.
3) DTD doesn't support datatypes. XSD supports datatypes for elements and attributes.
4) DTD doesn't support namespace. XSD supports namespace.
5) DTD doesn't define order for child elements. XSD defines order for child elements.
6) DTD is not extensible. XSD is extensible.
7) DTD is not simple to learn. XSD is simple to learn because you don't need to learn new language.
8) DTD provides less control on XML structure. XSD provides more control on XML structure.
CDATA PCDATA
CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an XML document. Tags inside the CDATA text
are not treated as markup and entities will not be expanded.
Let's take an example for CDATA:
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <![CDATA[
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. ]]>
9. </employee>
In the above CDATA example, CDATA is used just after the element employee to make the data/text unparsed, so it will give the value of
employee:
<firstname>vimal</firstname><lastname>jaiswal</lastname><email>vimal@javatpoint.com</email>
PCDATA
PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in an XML document. PCDATA stands for Parsed Character
data. PCDATA is the text that will be parsed by a parser. Tags inside the PCDATA will be treated as markup and entities will be expanded.
In other words you can say that a parsed character data means the XML parser examine the data and ensure that it doesn't content entity
if it contains that will be replaced.
Let's take an example:
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
In the above example, the employee element contains 3 more elements 'firstname', 'lastname', and 'email', so it parses further to get the
data/text of firstname, lastname and email to give the value of employee as:
vimal jaiswal vimal@javatpoint.com
XML Parsers
Parsing XML refers to going through XML document to access data or to modify data in one or other way. Parser has the job of reading the
XML, checking it for errors, and passing it on to the intended application. If no DTD or schema is provided, the parser simply checks that
the XML is well-formed. If a DTD is provided then the parser also determines whether the XML is valid, i.e. that the tags, attributes, and
content meet the specifications found in the DTD, before passing it on to the application.
Let's understand the working of XML parser by the figure given below:
DOM SAX
Tree model parser (Tree of nodes) Event based parser (Sequence of events)
DOM loads the file into the memory and then parse SAX parses the file at it reads i.e. Parses node by
the file node
Has memory constraints since it loads the whole No memory constraints as it does not store the XML
XML file before parsing content in the memory
DOM is read and write (can insert or delete the node) SAX is read only i.e. can’t insert or delete the node
If the XML content is small then prefer DOM parser Use SAX parser when memory content is large
Backward and forward search is possible for SAX reads the XML file from top to bottom and
searching the tags and evaluation of the information backward navigation is not possible
inside the tags. So this gives the ease of navigation
Slower at runtime Faster at runtime
What are the usual application for a DOM parser and for a SAX parser?
In the following cases, using SAX parser is advantageous than using DOM parser.
The input document is too big for available memory
You can process the document in small contiguous chunks of input. You do not need the entire document before you can do useful
work
You just want to use the parser to extract the information of interest, and all your computation will be completely based on the data
structures created by yourself.
In the following cases, using DOM parser is advantageous than using SAX parser.
Your application needs to access widely separately parts of the document at the same time.
Your application may probably use an internal data structure which is almost as complicated as the document itself.
Your application has to modify the document repeatedly.
Your application has to store the document for a significant amount of time through many method calls.
XML – DOM
The Document Object Model (DOM) is the foundation of XML. XML documents have a hierarchy of informational units called nodes; DOM
is a way of describing those nodes and the relationships between them.
A DOM Document is a collection of nodes or pieces of information organized in a hierarchy. This hierarchy allows a developer to navigate
through the tree looking for specific information. Because it is based on a hierarchy of information, the DOM is said to be tree based.
The XML DOM, on the other hand, also provides an API that allows a developer to add, edit, move, or remove nodes in the tree at any
point in order to create an application.
XML DOM Tree Example
XSLT Example
We will use the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>Thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>
XSL Sort
Where to put the Sort Information
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To output this XML file as an ordinary HTML file, and sort it at the same time, simply add an order-by attribute to your for-each element
like this:
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
The order-by attributes takes a plus (+) or minus (-) sign, to define an ascending or descending sort order, and an element name to define
the sort element.
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XSL Conditional If
Where to put the IF condition
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To put an conditional if test against the content of the file, simply add an xsl:if element to your XSL document like this:
<xsl:if match=".[ARTIST='Bob Dylan']">
... some output ...
</xsl:if>
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<xsl:if match=".[ARTIST='Bob Dylan']">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:if>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>