0% found this document useful (0 votes)
82 views

Introduction To XML: The Two Problems

The document provides an introduction to XML, explaining what it is, its main purposes and benefits. XML is a markup language that defines rules for encoding documents to make them both human-readable and machine-readable. It allows users to define their own tags to describe data, making the data understandable by computers.

Uploaded by

ayushi mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Introduction To XML: The Two Problems

The document provides an introduction to XML, explaining what it is, its main purposes and benefits. XML is a markup language that defines rules for encoding documents to make them both human-readable and machine-readable. It allows users to define their own tags to describe data, making the data understandable by computers.

Uploaded by

ayushi mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Introduction to XML

Introduction
XML is a new type of language which has been developed for the web which is different to any other type of scripting or programming
language available before. Instead of being concerned with the processing and display of data, XML's primary purpose is to tell the
computer what data entered actually means.

The Two Problems


There are two main reasons for the development of XML:

1. Computers do not understand the information placed in them.. For example there is no way for a search engine, or any other
computer,
to know that this is page contains the introduction part of an XML tutorial. All it is is a collection of letters and numbers, with
HTML formatting around it. The computer cannot even tell what on this page is a heading, what is text and what is an advert.
This is the main problem which XML was designed to overcome. If a page or document is written in XML, a computer can
understand exactly what it is about. As will probably be obvious, this has very major implications for search engine technology.
If a search engine knew exactly what was on a page, it would be able to instantly provide the exact results a person was looking
for, with no inaccurate matches and no half-relevant pages. This is just the revolution the over-bloated web needs.
2. Web pages are not compatible across different devices. One of the major difficulties that web designers have today is that
people are now accessing the pages from a variety of different devices. PCs, Macs, mobile phones, palmtop computers and even
televisions. Because of this, web designers must now either produce their pages in several different formats to cope with this, or
they must cut back on the design in order to have the page compatible across the different formats. Because XML is used to
define what data means and not how it is displayed, it makes it very easy to use the same data on several different platforms.

What Is XML?
So what actually is XML? The thing about it which people find the most difficult to understand is that XML does not actually do anything.
XML is not a way to design your home page and it won't change the way in which you build sites. This has made many people believe
that XML is useless, as they can't see a way that it will benefit them. XML has a wide variety of benefits though, two of which were
outlined above.
The real use of XML, though, is to describe data. It is used, in a similar way in which HTML is, except for the fact that there is a major
difference between the two:

HTML is used to describe how data is formatted.


XML is used to describe what data actually means.

The Language
As mentioned above, XML looks, and is structured very similarly to HTML. They both use the system where tags are used to enclose the
data they refer to. They both can use nested tags and both can also have attributes added to their tags.

The most revolutionary thing about XML, though is that you are not restricted to just using the normal, pre-defined tags like font and br.
Instead you are responsible for making up the tags yourself. You can name them anything you like and can use them to represent
anything you like. This is a feature which cannot be found in any other scripting language on the web.

Is It Difficult To Learn?
The answer to this, in short, is no. The only thing you have to learn about XML is how to structure your tags, and they are in fact almost
identical to HTML tags. Most of it is just logical thinking. Before learning XML it is important that you already know HTML. It is also useful
if you know a web scripting language such as PHP, ASP or JavaScript. If you do not yet know these try some of the tutorials on the site. If
you are looking to be able to format a web page, not describe data, you will be better of learning XHTML, the new standard replacing H
TML.

What is xml
o Xml (eXtensible Markup Language) is a markup language.
o XML is designed to store and transport data.
o Xml was released in late 90’s. it was created to provide an easy to use and store self describing data.
o XML became a W3C Recommendation on February 10, 1998.
o XML is not a replacement for HTML.
o XML is designed to be self-descriptive.
o XML is designed to carry data, not to display data.
o XML tags are not predefined. You must define your own tags.
o XML is platform independent and language independent.

XML - Overview
XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language
(SGML).

XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are
used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many
successful features of HTML.

There are three important characteristics of XML that make it useful in a variety of systems and solutions:
 XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits your application.
 XML carries the data, does not present it: XML allows you to store the data irrespective of how it will be presented.
 XML is a public standard: XML was developed by an organization called the World Wide Web Consortium (W3C) and is available
as an open standard.

What is Markup?
XML is a markup language that defines set of rules for encoding documents in a format that is both human-readable and machine-
readable. So what exactly is a markup language? Markup is information added to a document that enhances its meaning in certain ways,
in that it identifies the parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed
in the text of a document to demarcate and label the parts of that document.

Following example shows how XML markup looks, when embedded in a piece of text:

<message>
<text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as <message>...</message> and <text>... </text>. The tags <message> and
</message> mark the start and the end of the XML code fragment. The tags <text> and </text> surround the text Hello, world!.

Is XML a Programming Language?


A programming language consists of grammar rules and its own vocabulary which is used to create computer programs. These programs
instructs computer to perform specific tasks. XML does not qualify to be a programming language as it does not perform any computation
or algorithms. It is usually stored in a simple text file and is processed by special software that is capable of interpreting XML.

Why xml
Platform Independent and Language Independent: The main benefit of xml is that you can use it to take data from a program like Microsoft
SQL, convert it into XML then share that XML with other programs and platforms. You can communicate between two platforms which are
generally very difficult. The main thing which makes XML truly powerful is its international acceptance. Many corporation use XML
interfaces for databases, programming, office application mobile phones and more. It is due to its platform independent feature.

Features and Advantages of XML


XML is widely used in the era of web development. It is also used to simplify data storage and data sharing.
The main features or advantages of XML are given below.

1) XML separates data from HTML


If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes. With
XML, data can be stored in separate XML files. This way you can focus on using HTML/CSS for display and layout, and be sure that changes
in the underlying data will not require any changes to the HTML.
With a few lines of JavaScript code, you can read an external XML file and update the data content of your web page.

2) XML simplifies data sharing


In the real world, computer systems and databases contain data in incompatible formats. XML data is stored in plain text format. This
provides a software- and hardware-independent way of storing data. This makes it much easier to create data that can be shared by
different applications.
3) XML simplifies data transport
One of the most time-consuming challenges for developers is to exchange data between incompatible systems over the Internet.
Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications.

4) XML simplifies Platform change


Upgrading to new systems (hardware or software platforms), is always time consuming. Large amounts of data must be converted and
incompatible data is often lost. XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems,
new applications, or new browsers, without losing data.

5) XML increases data availability


Different applications can access your data, not only in HTML pages, but also from XML data sources. With XML, your data can be available
to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc), and make it more available for blind people, or
people with other disabilities.

6) XML can be used to create new internet languages


A lot of new Internet languages are created with XML.

Here are some examples:


o XHTML
o WSDL for describing available web services
o WAP and WML as markup languages for handheld devices
o RSS languages for news feeds
o RDF and OWL for describing resources and ontology
o SMIL for describing multimedia for the web

The main difference between XML and HTML


XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, XML is about describing information.

XML Related Technologies


Here we have pointed out XML related technologies. There are following XML related technologies:

No. Technology Meaning Description


1) XHTML Extensible html It is a clearer and stricter version of XML. It belongs to the family of
XML markup languages. It was developed to make html more
extensible and increase inter-operability with other data.
2) XML DOM XML document It is a standard document model that is used to access and manipulate
object model XML. It defines the XML file in tree structure.

3) XSL Extensible style


it contain three sheet language i) It transforms XML into other formats, like html.
parts: ii) It is used for formatting XML to screen, paper etc.
i) XSLT (xsl iii) It is a language to navigate XML documents.
transform)
ii) XSL
iii)XPath
4) XQuery XML query It is a XML based language which is used to query XML based data.
language
5) DTD Document type It is an standard which is used to define the legal elements in an XML
definition document.

6) XSD XML schema It is an XML based alternative to dtd. It is used to describe the
definition structure of an XML document.
7) XLink XML linking xlink stands for XML linking language. This is a language for creating
language hyperlinks (external and internal links) in XML documents.
8) XPointer XML pointer It is a system for addressing components of XML based internet
language media. It allows the xlink hyperlinks to point to more specific parts in
the XML document.
9) SOAP Simple object It is an acronym stands simple object access protocol. It is XML based
access protocol protocol to let applications exchange information over http. in simple
words you can say that it is protocol used for accessing web services.

10) WSDL web services It is an XML based language to describe web services. It also describes
description the functionality offered by a web service.
languages

11) RDF Resource RDF is an XML based language to describe web resources. It is a
description standard model for data interchange on the web. It is used to describe
framework the title, author, content and copyright information of a web page.
12) SVG Scalable vector It is an XML based vector image format for two-dimensional images.
graphics It defines graphics in XML format. It also supports animation.

13) RSS Really simple RSS is a XML-based format to handle web content syndication. It is
syndication used for fast browsing for news and updates. It is generally used for
news like sites.

XML - Syntax
This chapter takes you through the simple syntax rules to write an XML document. Following is a complete XML document:

<?xml version="1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
You can notice there are two kinds of information in the above example:
 markup, like <contact-info> and
 the text, or the character data, Tutorials Point and (040) 123-4567.
The following diagram depicts the syntax rules to write different types of markup and text in an XML document.

Let us see each component of the above diagram in detail:

XML Declaration
The XML document can optionally have an XML declaration. It is written as below:

<?xml version="1.0" encoding="UTF-8"?>


Where version is the XML version and encoding specifies the character encoding used in the document.
Syntax Rules for XML declaration
An example XML document:
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line in the document: The XML declaration should always be included. It defines the XML version of the document. In this case
the document conforms to the 1.0 specification of XML:
<?xml version="1.0"?>
The next line defines the first element of the document (the root element):
<note>
The next lines defines 4 child elements of the root (to, from, heading, and body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
The last line defines the end of the root element:
</note>

All XML elements must have a closing tag


In HTML some elements do not have to have a closing tag. The following code is legal in HTML:
<p>This is a paragraph
<p>This is another paragraph
In XML all elements must have a closing tag like this:
<p>This is a paragraph</p>
<p>This is another paragraph</p>

XML tags are case sensitive


XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must therefore be written with the same case:
<Message>This is incorrect</message>
<message>This is
correct</message>

All XML elements must be properly nested


In HTML some elements can be improperly nested within each other like this:
<b><i>This text is bold and italic</b></i> <b><i>This text is bold and italic</i></b>

In XML all elements must be properly nested within each other like this

All XML documents must have a root tag


All XML documents must contain a single tag pair to define the root element. All other elements must be nested within the root element.
All elements can have sub (children) elements. Sub elements must be in pairs and correctly nested within their parent element:
<root>
<child>
<subchild>
</subchild>
</child>
</root>
Attribute values must always be quoted
XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the
two XML documents below. The first one is incorrect, the second is correct:
<?xml version="1.0"?> <?xml version="1.0"?>
<note date=12/11/99> <note date="12/11/99">
<to>Tove</to> <to>Tove</to>
<from>Jani</from> <from>Jani</from>
<heading>Reminder</heading> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body> <body>Don't forget me this weekend!</body>
</note> </note>
XML References
References usually allow you to add or include additional text or markup in an XML document. References always begin with the
symbol "&" ,which is a reserved character and end with the symbol ";". XML has two types of references:

Entity References: An entity reference contains a name between the start and the end delimiters. For
example &amp; where amp is name. The namerefers to a predefined string of text and/or markup.

Character References: These contain references, such as &#65;, contains a hash mark (“#”) followed by a number. The number always
refers to the Unicode code of a character. In this case, 65 refers to alphabet "A".

XML Text
 The names of XML-elements and XML-attributes are case-sensitive, which means the name of start and end elements need to
be written in the same case.
 To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
 Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the XML-attributes will be
ignored.
 Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use them, some replacement-
entities are used, which are listed below:

not allowed character replacement-entity character description
< &lt; less than
> &gt; greater than
& &amp; ampersand
' &apos; apostrophe
" &quot; quotation mark
XML - Documents
An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An
XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a
mathematical equation.
XML Document example
A simple document is given in the following example:
<?xml version="1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
The following image depicts the parts of XML document.

Document Prolog Section


The document prolog comes at the top of the document, before the root element. This section contains:
 XML declaration
 Document type declaration
Document Elements Section
Document Elements are the building blocks of XML. These divide the document into a hierarchy of sections, each serving a specific
purpose. You can separate a document into multiple sections so that they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other elements.

XML - Declaration
This chapter covers XML declaration in detail. XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in first line of the XML document.
Syntax
Following syntax shows XML declaration:
<?xml
version="version_number"
encoding="encoding_declaration"
standalone="standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter value inside a quote. Following table shows the above
syntax in detail:
Parameter Parameter_value Parameter_description
Version 1.0 Specifies the version of the XML standard used.
Encoding UTF-8, UTF-16, ISO-10646- It defines the character encoding used in the document.
UCS-2, ISO-10646-UCS-4, UTF-8 is the default encoding used.
ISO-8859-1 to ISO-8859-9,
ISO-2022-JP, Shift_JIS, EUC-
JP
Standalone yes or no. It informs the parser whether the document relies on the
information from an external source, such as external
document type definition (DTD), for its content. The
default value is set to no. Setting it to yes tells the
processor there are no external declarations required for
parsing the document.
Rules
An XML declaration should abide with the following rules:
 If the XML declaration is present in the XML, it must be placed as the first line in the XML document.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.
 The order of placing the parameters is important. The correct order is:version, encoding and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>
XML Declaration Examples
Following are few examples of XML declarations:
XML declaration with no parameters:
<?xml >
XML declaration with version definition:
<?xml version="1.0">
XML declaration with all parameters defined:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
XML declaration with all parameters defined in single quotes:
<?xml version='1.0' encoding='iso-8859-1' standalone='no' ?>
An XML document contains XML Elements.

XML Tree
XML documents form a tree structure that starts at "the root" and branches to "the leaves".

XML Tree Structure

An Example XML Document


The image above represents books in this XML:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

XML Tree Structure


XML documents are formed as element trees.
An XML tree starts at a root element and branches from the root to child elements.
All elements can have sub elements (child elements):
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The terms parent, child, and sibling are used to describe the relationships between elements.
Parent have children. Children have parents. Siblings are children on the same level (brothers and sisters).
All elements can have text content (Harry Potter) and attributes (category="cooking").

XML Tree Rules


These rules are used to figure out the relationship of the elements. It shows if an element is a child or a parent of the other element.
Descendants: If element A is contained by element B, then A is known as descendant of B. In the above example "College" is the root
element and all the other elements are the descendants of "College".
Ancestors: The containing element which contains other elements is called "Ancestor" of other element. In the above example Root
element (College) is ancestor of all other elements.

Self-Describing Syntax
XML uses a much self-describing syntax.
A prolog defines the XML version and the character encoding:
<?xml version="1.0" encoding="UTF-8"?>

The next line is the root element of the document:


<bookstore>

The next line starts a <book> element:


<book category="cooking">

The <book> elements have 4 child elements: <title>,< author>, <year>, <price>.
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>

The next line ends the book element:


</book>
You can assume, from this example, that the XML document contains information about books in a bookstore.

XML Element
XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media
objects or all of these.
Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or for empty elements,
by an empty-element tag.
Syntax
Following is the syntax to write an XML element:
<element-name attribute1 attribute2>
....content
</element-name>
where
 element-name is the name of the element. The name its case in the start and end tags must match.
 attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property of the element.
It associates a name with a value, which is a string of characters. An attribute is written as:
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.

Empty Element
An empty element (element with no content) has following syntax:
<name attribute1 attribute2.../>
Example of an XML document using various XML element:
<?xml version="1.0"?>
<contact-info>
<address category="residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
<address/>
</contact-info>
An element with no content is said to be empty.

In XML, you can indicate an empty element like this:


<element></element>
You can also use a so called self-closing tag:
<element />
The two forms produce identical results in XML software (Readers, Parsers, Browsers).
Empty elements can have attributes.

XML Naming Rules


XML elements must follow these naming rules:
 Element names are case-sensitive
 Element names must start with a letter or underscore
 Element names cannot start with the letters xml (or XML, or Xml, etc)
 Element names can contain letters, digits, hyphens, underscores, and periods
 Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).

Best Naming Practices


Create descriptive names, like this: <person>, <firstname>, <lastname>.
Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.
Avoid "-". If you name something "first-name", some software may think you want to subtract "name" from "first".
Avoid ".". If you name something "first.name", some software may think that "name" is a property of the object "first".
Avoid ":". Colons are reserved for namespaces (more later).
Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software doesn't support them.

Naming Styles
There are no naming styles defined for XML elements. But here are some commonly used:
Style Example Description
Lower case <firstname> All letters lower case
Upper case <FIRSTNAME> All letters upper case
Underscore <first_name> Underscore separates words
Pascal case <FirstName> Uppercase first letter in each word
Camel case <firstName> Uppercase first letter in each word except the first
If you choose a naming style, it is good to be consistent!
XML documents often have a corresponding database. A common practice is to use the naming rules of the database for the XML
elements.
Camel case is a common naming rule in JavaScripts.

XML Elements are Extensible


XML elements can be extended to carry more information.
Look at the following XML example:
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Let's imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce
this output:
MESSAGE
To: Tove
From: Jani
Don't forget me this weekend!
Imagine that the author of the XML document added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Should the application break or crash?


No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same
output.
This is one of the beauties of XML. It can be extended without breaking applications.

XML - Attributes
This chapter describes about the XML attributes. Attributes are part of the XML elements. An element can have multiple unique attributes.
Attribute gives more information about XML elements. To be more precise, they define properties of elements. An XML attribute is always
a name-value pair.

Syntax
An XML attribute has following syntax:
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form:
name = "value"
value has to be in double (" ") or single (' ') quotes. Here, attribute1 andattribute2 are unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise associate it with
some string of data. Following example demonstrates the use of attributes:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category="flowers" />
<plants category="shrubs">
</plants>
</garden>
Attributes are used to distinguish among elements of the same name. When you do not want to create a new element for every situation.
Hence, use of an attribute can add a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attributecategory and assigning different values to each of the
elements. Hence we have two categories of plants, one flowers and other color. Hence we have two plant elements with different
attributes.
You can also observe that we have declared this attribute at the beginning of the XML.

Attribute Types
Following table lists the type of attributes:
Attribute Type Description
StringType It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means,
any string of non-markup characters is a legal part of the attribute.
TokenizedType This is more constrained type. The validity constraints noted in the grammar are applied after
the attribute value is normalized. The TokenizedType attributes are given as:
 ID : It is used to specify the element as unique.
 IDREF : It is used to reference an ID that has been named for another element.
 IDREFS : It is used to reference all IDs of an element.
 ENTITY : It indicates that the attribute will represent an external entity in the
document.
 ENTITIES : It indicates that the attribute will represent external entities in the
document.
 NMTOKEN : It is similar to CDATA with restrictions on what data can be part of the
attribute.
 NMTOKENS : It is similar to CDATA with restrictions on what data can be part of the
attribute.
EnumeratedType This has a list of predefined values in its declaration. out of which, it must assign one value.
There are two types of enumerated attribute:
 NotationType : It declares that an element will be referenced to a NOTATION
declared somewhere else in the XML document.
 Enumeration : Enumeration allows you to define a specific list of values that the
attribute value must match.

Why should we avoid XML attributes


o Attributes cannot contain multiple values but child elements can have multiple values.
o Attributes cannot contain tree structure but child element can.
o Attributes are not easily expandable. If you want to change in attribute's vales in future, it may be complicated.
o Attributes cannot describe structure but child elements can.
o Attributes are more difficult to be manipulated by program code.
o Attributes values are not easy to test against a DTD, which is used to define the legal elements of an XML document.

Difference between attribute and sub-element


In the context of documents, attributes are part of markup, while sub elements are part of the basic document contents.
In the context of data representation, the difference is unclear and may be confusing.
Same information can be represented in two ways:
1st way:
1. <book publisher="Tata McGraw Hill"> </book>
2nd way:
1. <book>
2. <publisher> Tata McGraw Hill </publisher>
3. </book>
In the first example publisher is used as an attribute and in the second example publisher is an element.
Both examples provide the same information but it is good practice to avoid attribute in XML and use elements instead of attributes.
An Exception to my Attribute rule (XML Attributes for Metadata)

Rules always have exceptions. My rule about not using attributes has one too:
Sometimes I assign ID references to elements in my XML documents. These ID references can be used to access XML element in much
the same way as the NAME or ID attributes in HTML. This example demonstrates this:
<?xml version="1.0"?>
<messages>
<note ID="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

<note ID="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>

The ID in these examples is just a counter, or a unique identifier, to identify the different notes in the XML file.

XML - Comments
This chapter explains how comments work in XML documents. XML comments are similar to HTML comments. The comments are added
as notes or lines for understanding the purpose of an XML code.
Comments can be used to include related links, information and terms. They are visible only in the source code; not in the XML code.
Comments may appear anywhere in XML code.
Syntax
XML comment has following syntax:
<!-------Your comment----->
A comment starts with <!-- and ends with -->. You can add textual notes as comments between the characters. You must not nest one
comment inside the other.
Example
Following example demonstrates the use of comments in XML document:
<?xml version="1.0" encoding="UTF-8" ?>
<!---Students grades are uploaded by months---->
<class_list>
<student>
<name>Tanmay</name>
<grade>A</grade>
</student>
</class_list>
Any text between <!-- and --> characters is considered as a comment.

XML Comments Rules


Following rules are needed to be followed for XML comments:
 Comments cannot appear before XML declaration.
 Comments may appear anywhere in a document.
 Comments must not appear within attribute values.
 Comments cannot be nested inside the other comments.

XML - Character Entities


This chapter describes the XML Character Entities. Before we understand the Character Entities, let us first understand what an XML
entity is.
As put by W3 Consortium the definition of entity is as follows:
The document entity serves as the root of the entity tree and a starting-point for an XML processor.
This means, entities are the placeholders in XML. These can be declared in the document prolog or in a DTD. There are different types of
entities and this chapter will discuss Character Entity.
Both, the HTML and the XML, have some symbols reserved for their use, which cannot be used as content in XML code. For example, <
and > signs are used for opening and closing XML tags. To display these special characters, the character entities are used.
There are few special characters or symbols which are not available to be typed directly from keyboard. Character Entities can be used to
display those symbols/special characters also.

Types of Character Entities


There are three types of character entities:
 Predefined Character Entities
 Numbered Character Entities
 Named Character Entities

Predefined Character Entities


They are introduced to avoid the ambiguity while using some symbols. For example, an ambiguity is observed when less than ( < ) or
greater than ( > ) symbol is used with the angle tag(<>). Character entities are basically used to delimit tags in XML. Following is a list of
pre-defined character entities from XML specification. These can be used to express characters without ambiguity.
 Ampersand: &amp;
 Single quote: &apos;
 Greater than: &gt;
 Less than: &lt;
 Double quote: &quot;
Numeric Character Entities
The numeric reference is used to refer to a character entity. Numeric reference can either be in decimal or hexadecimal format. As there
are thousands of numeric references available, these are a bit hard to remember. Numeric reference refers to the character by its number
in the Unicode character set.
General syntax for decimal numeric reference is:
&# decimal number ;
General syntax for hexadecimal numeric reference is:
&#x Hexadecimal number ;
The following table lists some predefined character entities with their numeric values:
Entity name Character Decimal reference Hexadecimal reference
quot " &#34; &#x22;
amp & &#38; &#x26;
apos ' &#39; &#x27;
lt < &#60; &#x3C;
gt > &#62; &#x3E;

Named Character Entity


As its hard to remember the numeric characters, the most preferred type of character entity is the named character entity. Here, each
entity is identified with a name.
For example:
 'Aacute' represents capital character with acute accent.
 'ugrave' represents the small with grave accent.

XML - CDATA Sections


This chapter discusses the XML CDATA section. The term CDATA means, Character Data. CDATA are defined as blocks of text that are not
parsed by the parser, but are otherwise recognized as markup.
The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, CDATA
section can be used. By using CDATA section, you are commanding the parser that the particular section of the document contains no
markup and should be treated as regular text.
Syntax
Following is the syntax for CDATA section:
<![CDATA[
characters with markup
]]>
The above syntax is composed of three sections:
 CDATA Start section - CDATA begins with the nine-character delimiter <![CDATA[
 CDATA End section - CDATA section ends with ]]> delimiter.
 CData section - Characters between these two enclosures are interpreted as characters, and not as markup. This section may
contain markup characters (<, >, and &), but they are ignored by the XML processor.
Example
The following markup code shows example of CDATA. Here, each character written inside the CDATA section is ignored by the parser.
<script>
<![CDATA[
<message> Welcome to TutorialsPoint </message>
]] >
</script >
In the above syntax, everything between <message> and </message> is treated as character data and not as markup.

CDATA Rules
The given rules are required to be followed for XML CDATA:
 CDATA cannot contain the string "]]>" anywhere in the XML document.
 Nesting is not allowed in CDATA section.
 XML - White Spaces
This chapter discusses white space handling in XML documents. Whitespace is a collection of spaces, tabs, and newlines. They are
generally used to make a document more readable.
XML document contain two types of white spaces (a) Significant Whitespace and (b) Insignificant Whitespace. Both are explained below
with examples.

Significant Whitespace
A significant Whitespace occurs within the element which contain text and markup present together. For example:
<name>TanmayPatil</name>
and
<name>Tanmay Patil</name>
The above two elements are different because of the space between Tanmayand Patil. Any program reading this element in an XML file
is obliged to maintain the distinction.

Insignificant Whitespace
Insignificant whitespace means the space where only element content is allowed. For example:
<address.category="residence">
or
<address....category="..residence">
The above two examples are same. Here, the space is represented by dots (.). In the above example, the space
between address and category is insignificant.
A special attribute named xml:space may be attached to an element. This indicates that whitespace should not be removed for that
element by the application. You can set this attribute to default or preserve as shown in the example below:
<!ATTLIST address xml:space (default|preserve) 'preserve'>
Where:
 The value default signals that the default whitespace processing modes of an application are acceptable for this element;
 The value preserve indicates the application to preserve all the whitespaces.

XML – Encoding

Encoding is the process of converting unicode characters into their equivalent binary representation. When the XML processor reads an
XML document, it encodes the document depending on the type of encoding. Hence, we need to specify the type of encoding in the XML
declaration.

Encoding Types
There are mainly two types of encoding:
 UTF-8
 UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of
bits used to represent a character. They are either 8(one byte) or 16(two bytes). For the documents without encoding information, UTF-
8 is set by default.
Syntax
Encoding type is included in the prolog section of the XML document. The syntax for UTF-8 encoding is as below:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
Syntax for UTF-16 encoding
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>

Example
Following example shows declaration of encoding:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
In the above example encoding="UTF-8", specifies that 8-bits are used to represent the characters. To represent 16-bit characters, UTF-
16 encoding can be used.
The XML files encoded with UTF-8 tend to be smaller in size than those encoded with UTF-16 format.

XML – Namespaces
A Namespace is a set of unique names. Namespace is a mechanisms by which element and attribute name can be assigned to group. The
Namespace is identified by URI(Uniform Resource Identifiers).

Namespace Declaration
A Namspace is declared using reserved attributes. Such an attribute name must either be xmlns or begin with xmlns: shown as below:
<element xmlns:name="URL">
Syntax
 The Namespace starts with the keyword xmlns.
 The word name is the Namespace prefix.
 The URL is the Namespace identifier.

Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different
XML applications.
This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both contain a <table> element, but the elements have
different content and meaning.
A user or an XML application will not know how to handle these differences.

Solving the Name Conflict Using a Prefix


Name conflicts in XML can easily be avoided using a name prefix.
This XML carries information about an HTML table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have different names.

XML Namespaces - The xmlns Attribute


When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
The namespace declaration has the following syntax. xmlns:prefix="URI".
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>

<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
In the example above:
The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.
Namespaces can also be declared in the XML root element:
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose of using an URI is to give the namespace a unique name.
However, companies often use the namespace as a pointer to a web page containing namespace information.

Uniform Resource Identifier (URI)


A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource.
The most common URI is the Uniform Resource Locator (URL) which identifies an Internet domain address. Another, not so common
type of URI is the Universal Resource Name (URN).

Default Namespaces
Defining a default namespace for an element saves us from using prefixes in all the child elements. It has the following syntax:
xmlns="namespaceURI"
This XML carries HTML table information:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a piece of furniture:
<table xmlns="http://www.w3schools.com/furniture">
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>

XML Validation

A well formed XML document can be validated against DTD or Schema.


A well-formed XML document is an XML document with correct syntax. It is very necessary to know about valid XML document before
knowing XML validation.
"Well Formed" XML documents
A "Well Formed" XML document is a document that conforms to the XML syntax rules that we described in the previous chapter.
The following is a "Well Formed" XML document:
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Valid XML Documents


A "well formed" XML document is not the same as a "valid" XML document.
A "valid" XML document must be well formed. In addition, it must conform to a document type definition.
There are two different document type definitions that can be used with XML:
 DTD - The original Document Type Definition
 XML Schema - An XML-based alternative to DTD
A document type definition defines the rules and the legal elements and attributes for an XML document.
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

XML DTD
A DTD defines the legal elements of an XML document. In simple words we can say that a DTD defines the document structure with a list
of legal elements and attributes. XML schema is a XML based alternative to DTD. Actually DTD and XML schema both are used to form a
well formed XML document. We should avoid errors in XML documents because they will stop the XML programs.

XML schema
It is defined as an XML language. Uses namespaces to allow for reuses of existing definitions. It supports a large number of built in data
types and definition of derived data types.

XML – DTDs

The XML Document Type Declaration, commonly known as DTD, is a way to describe XML language precisely. DTDs check vocabulary and
validity of the structure of XML documents against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a separate document and then liked separately.
Syntax
Basic syntax of a DTD is as follows:
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
 The DTD starts with <!DOCTYPE delimiter.
 An element tells the parser to parse the document from the specified root element.
 DTD identifier is an identifier for the document type definition, which may be the path to a file on the system or URL to a file on
the internet. If the DTD is pointing to external path, it is called External Subset.
 The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.

Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it as internal DTD, standalone attribute in
XML declaration must be set to yes. This means, the declaration works independent of external source.
Syntax
The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you declare the elements.
Example
Following is a simple example of internal DTD:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Let us go through the above code:
Start Declaration- Begin the XML declaration with following statement
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
DTD- Immediately after the XML header, the document type declarationfollows, commonly referred to as the DOCTYPE:
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The DOCTYPE informs the parser that a DTD is
associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you declare elements, attributes, entities, and notations:
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document. <!ELEMENT name (#PCDATA)> defines the
element nameto be of type "#PCDATA". Here #PCDATA means parse-able text data.
End Declaration - Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This
effectively ends the definition, and thereafter, the XML document follows immediately.
Rules
 The document type declaration must appear at the start of the document (preceded only by the XML header) — it is not permitted
anywhere else within the document.
 Similar to the DOCTYPE declaration, the element declarations must start with an exclamation mark.
 The Name in the document type declaration must match the element type of the root element.

External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either
the legal .dtd file or a valid URL. To refer it as external DTD, standalone attribute in the XML declaration must be set as no. This means,
declaration includes information from the external source.
Syntax
Following is the syntax for external DTD:
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
Example
The following example shows external DTD usage:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd are as shown:
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

Types
You can refer to an external DTD by using either system identifiers or public identifiers.
SYSTEM IDENTIFIERS
A system identifier enables you to specify the location of an external file containing DTD declarations. Syntax is as follows:
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the document.
PUBLIC IDENTIFIERS
Public identifiers provide a mechanism to locate DTD resources and are written as below:
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a
catalog. Public identifiers can follow any format, however, a commonly used format is calledFormal Public Identifiers, or FPIs.

DTD – Elements

Declaring an Element
In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax:
<!ELEMENT element-name (element-content)>

Empty elements
Empty elements are declared with the keyword EMPTY inside the parentheses:
<!ELEMENT element-name (EMPTY)>
example:
<!ELEMENT img (EMPTY)>

Elements with data


Elements with data are declared with the data type inside parentheses:
<!ELEMENT element-name (#CDATA)>
or
<!ELEMENT element-name (#PCDATA)>
or
<!ELEMENT element-name (ANY)>
example:
<!ELEMENT note (#PCDATA)>

#CDATA means the element contains character data that is not supposed to be parsed by a parser.
#PCDATA means that the element contains data that IS going to be parsed by a parser.
The keyword ANY declares an element with any content.
If a #PCDATA section contains elements, these elements must also be declared.

Elements with children (sequences)


Elements with one or more children are defined with the name of the children elements inside the parentheses:
<!ELEMENT element-name (child-element-name)>
or
<!ELEMENT element-name (child-element-name,child-element-name,.....)>
example:
<!ELEMENT note (to,from,heading,body)>

When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a
full declaration, the children must also be declared, and the children can also have children. The full declaration of the note document
will be:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>

Wrapping
If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

Declaring only one occurrence of the same element


<!ELEMENT element-name (child-name)>
example
<!ELEMENT note (message)>
The example declaration above declares that the child element message can only occur one time inside the note element.
Declaring minimum one occurrence of the same element
<!ELEMENT element-name (child-name+)>
example
<!ELEMENT note (message+)>
The + sign in the example above declares that the child element message must occur one or more times inside the note element.
Declaring zero or more occurrences of the same element
<!ELEMENT element-name (child-name*)>
example
<!ELEMENT note (message*)>
The * sign in the example above declares that the child element message can occur zero or more times inside the note element.
Declaring zero or one occurrences of the same element
<!ELEMENT element-name (child-name?)>
example
<!ELEMENT note (message?)>
The ? sign in the example above declares that the child element message can occur zero or one times inside the note element.
Declaring mixed content
example
<!ELEMENT note (to+,from,header,message*,#PCDATA)>
The example above declares that the element note must contain at least one to child element, exactly one from child element, exactly
one header, zero or more message, and some other parsed character data as well.
DTD - Attributes
Declaring Attributes
In the DTD, XML element attributes are declared with an ATTLIST declaration. An attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value>

As you can see from the syntax above, the ATTLIST declaration defines the element which can have the attribute, the name of the
attribute, the type of the attribute, and the default attribute value.
The attribute-type can have the following values:
Value Explanation
CDATA The value is character data
(eval|eval|..) The value must be an enumerated value
ID The value is an unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml: The value is predefined
The attribute-default-value can have the following values:
Value Explanation
#DEFAULT value The attribute has a default value
#REQUIRED The attribute value must be included in the element
#IMPLIED The attribute does not have to be included
#FIXED value The attribute value is fixed

Attribute declaration example


DTD example:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
XML example:
<square width="100"></square>
In the above example the element square is defined to be an empty element with the attributes width of type CDATA. The width
attribute has a default value of 0.
Default attribute value
Syntax:
<!ATTLIST element-name attribute-name CDATA "default-value">
DTD example:
<!ATTLIST payment type CDATA "check">
XML example:
<payment type="check">
Specifying a default value for an attribute, assures that the attribute will get a value even if the author of the XML document didn't
include it.

Implied attribute
Syntax:
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD example:
<!ATTLIST contact fax CDATA #IMPLIED>
XML example:
<contact fax="555-667788">
Use an implied attribute if you don't want to force the author to include an attribute and you don't have an option for a default value
either.
Required attribute
Syntax:
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
DTD example:
<!ATTLIST person number CDATA #REQUIRED>
XML example:
<person number="5677">
Use a required attribute if you don't have an option for a default value, but still want to force the attribute to be present.
Fixed attribute value
Syntax:
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
DTD example:
<!ATTLIST sender company CDATA #FIXED "Microsoft">
XML example:
<sender company="Microsoft">
Use a fixed attribute value when you want an attribute to have a fixed value without allowing the author to change it. If an author
includes another value, the XML parser will return an error.
Enumerated attribute values
Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
DTD example:
<!ATTLIST payment type (check|cash) "cash">
XML example:
<payment type="check">
or
<payment type="cash">
Use enumerated attribute values when you want the attribute values to be one of a fixed set of legal values.
DTD - Entities
Entities
 Entities as variables used to define shortcuts to common text.
 Entity references are references to entities.
 Entities can be declared internal.
 Entities can be declared external
Internal Entity Declaration
Syntax:
<!ENTITY entity-name "entity-value">

DTD Example:
<!ENTITY writer "Jan Egil Refsnes.">
<!ENTITY copyright "Copyright XML101.">
XML example:
<author>&writer;&copyright;</author>
External Entity Declaration
Syntax:
<!ENTITY entity-name SYSTEM "URI/URL">

DTD Example:
<!ENTITY writer SYSTEM "http://www.xml101.com/entities/entities.xml">
<!ENTITY copyright SYSTEM "http://www.xml101.com/entities/entities.dtd">
XML example:
<author>&writer;&copyright;</author>

DTD Validation

Validating with the XML Parser


If you try to open an XML document, the XML Parser might generate an error. By accessing the parseError object, the exact error code,
the error text, and even the line that caused the error can be retrieved:
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.validateOnParse="true" var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.load("note_dtd_error.xml") xmlDoc.async="false"
xmlDoc.validateOnParse="false"
document.write("<br>Error Code: ") xmlDoc.load("note_dtd_error.xml")
document.write(xmlDoc.parseError.errorCode)
document.write("<br>Error Reason: ") document.write("<br>Error Code: ")
document.write(xmlDoc.parseError.reason) document.write(xmlDoc.parseError.errorCode)
document.write("<br>Error Line: ") document.write("<br>Error Reason: ")
document.write(xmlDoc.parseError.line) document.write(xmlDoc.parseError.reason)
document.write("<br>Error Line: ")
Turning Validation off document.write(xmlDoc.parseError.line)
Validation can be turned off by setting the XML parser's validateOnParse="false".

When to Use a DTD/Schema?


With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.
With a DTD, you can verify that the data you receive from the outside world is valid.
You can also use a DTD to verify your own data.

When to NOT to Use a DTD/Schema?


XML does not require a DTD/Schema.
When you are experimenting with XML, or when you are working with small XML files, creating DTDs may be a waste of time.
If you develop applications, wait until the specification is stable before you add a document definition. Otherwise, your software might
stop working because of validation errors.

XML - Schemas
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of
XML data. XML schema defines the elements, attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
Syntax
You need to declare a schema in your XML document as follows:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
Example
The following example shows how to use schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an XML document can take.
Elements
As we saw in the XML - Elements chapter, elements are the building blocks of XML document. An element can be defined within an XSD
as follows:
<xs:element name="x" type="y"/>

Definition Types
You can define XML schema elements in following ways:
Simple Type - Simple type element is used only in the context of the text. Some of predefined simple types are: xs:integer, xs:boolean,
xs:string, xs:date. For example:
<xs:element name="phone_number" type="xs:int" />
Complex Type - A complex type is a container for other element definitions. This allows you to specify which child elements an element
can contain and to provide some structure within your XML documents. For example:
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, Address element consists of child elements. This is a container for other <xs:element> definitions, that allows to
build a simple hierarchy of elements in the XML document.
Global Types - With global type, you can define a single type in your document, which can be used by all other references. For example,
suppose you want to generalize the person and company for different addresses of the company. In such case, you can define a general
type as below:
<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as below:
<xs:element name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once forAddress1 and once for Address2), we now have a single definition.
This makes maintenance simpler, i.e., if you decide to add "Postcode" elements to the address, you need to add them at just one place.

Attributes
Attributes in XSD provide extra information within an element. Attributes havename and type property as shown below:
<xs:attribute name="x" type="y"/>
XML Schemas are More Powerful than DTD
 XML Schemas are written in XML
 XML Schemas are extensible to additions
 XML Schemas support data types
 XML Schemas support namespaces

Why Use an XML Schema?


With XML Schema, your XML files can carry a description of its own format.
With XML Schema, independent groups of people can agree on a standard for interchanging data.
With XML Schema, you can verify data.

XML Schemas Support Data Types


One of the greatest strength of XML Schemas is the support for data types:
 It is easier to describe document content
 It is easier to define restrictions on data
 It is easier to validate the correctness of data
 It is easier to convert data between different data types

XML Schemas use XML Syntax


Another great strength about XML Schemas is that they are written in XML:
 You don't have to learn a new language
 You can use your XML editor to edit your Schema files
 You can use your XML parser to parse your Schema files
 You can manipulate your Schemas with the XML DOM
 You can transform your Schemas with XSLT

DTD vs XSD
There are many differences between DTD (Document Type Definition) and XSD (XML Schema Definition). In short, DTD provides less control
on XML structure whereas XSD (XML schema) provides more control.
The important differences are given below:
No. DTD XSD
1) DTD stands for Document Type Definition. XSD stands for XML Schema Definition.
2) DTDs are derived from SGML syntax. XSDs are written in XML.
3) DTD doesn't support datatypes. XSD supports datatypes for elements and attributes.
4) DTD doesn't support namespace. XSD supports namespace.
5) DTD doesn't define order for child elements. XSD defines order for child elements.
6) DTD is not extensible. XSD is extensible.
7) DTD is not simple to learn. XSD is simple to learn because you don't need to learn new language.
8) DTD provides less control on XML structure. XSD provides more control on XML structure.

CDATA PCDATA

CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an XML document. Tags inside the CDATA text
are not treated as markup and entities will not be expanded.
Let's take an example for CDATA:
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <![CDATA[
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. ]]>
9. </employee>
In the above CDATA example, CDATA is used just after the element employee to make the data/text unparsed, so it will give the value of
employee:
<firstname>vimal</firstname><lastname>jaiswal</lastname><email>vimal@javatpoint.com</email>

PCDATA
PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in an XML document. PCDATA stands for Parsed Character
data. PCDATA is the text that will be parsed by a parser. Tags inside the PCDATA will be treated as markup and entities will be expanded.
In other words you can say that a parsed character data means the XML parser examine the data and ensure that it doesn't content entity
if it contains that will be replaced.
Let's take an example:
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
In the above example, the employee element contains 3 more elements 'firstname', 'lastname', and 'email', so it parses further to get the
data/text of firstname, lastname and email to give the value of employee as:
vimal jaiswal vimal@javatpoint.com

XML Parsers

Parsing XML refers to going through XML document to access data or to modify data in one or other way. Parser has the job of reading the
XML, checking it for errors, and passing it on to the intended application. If no DTD or schema is provided, the parser simply checks that
the XML is well-formed. If a DTD is provided then the parser also determines whether the XML is valid, i.e. that the tags, attributes, and
content meet the specifications found in the DTD, before passing it on to the application.
Let's understand the working of XML parser by the figure given below:

Why do we need XML Parsers


We need XML parser because we do not want to do everything in our application from scratch, and we need some "helper" programs or
libraries to do something very low-level but very necessary to us.
 These low-level but necessary things include checking the well-formedness, validating the document against its DTD or schema (just
for validating parsers), resolving character reference, understanding CDATA sections, and so on.
 XML parsers are just such "helper" programs and they will do all these jobs.

Types of XML Parsers


These are the two main types of XML Parsers:
1. DOM
2. SAX
DOM (Document Object Model)
A DOM document is an object which contains all the information of an XML document. It is composed like a tree structure. The DOM Parser
implements a DOM API. This API is very simple to use.
Features of DOM Parser
A DOM Parser creates an internal structure in memory which is a DOM document object and the client applications get information of the
original XML document by invoking methods on this document object.
DOM Parser has a tree based structure.
Advantages
1) It supports both read and write operations and the API is very simple to use.
2) It is preferred when random access to widely separated parts of a document is required.
Disadvantages
1) It is memory inefficient. (consumes more memory because the whole XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.

SAX (Simple API for XML)


A SAX Parser implements SAX API. This API is an event based API and less intuitive.
Features of SAX Parser
It does not create any internal structure.
Clients does not know what methods to call, they just overrides the methods of the API and place his own code inside method.
It is an event based parser, it works like an event handler in Java.
Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.
Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.

DOM SAX
Tree model parser (Tree of nodes) Event based parser (Sequence of events)
DOM loads the file into the memory and then parse SAX parses the file at it reads i.e. Parses node by
the file node
Has memory constraints since it loads the whole No memory constraints as it does not store the XML
XML file before parsing content in the memory
DOM is read and write (can insert or delete the node) SAX is read only i.e. can’t insert or delete the node
If the XML content is small then prefer DOM parser Use SAX parser when memory content is large
Backward and forward search is possible for SAX reads the XML file from top to bottom and
searching the tags and evaluation of the information backward navigation is not possible
inside the tags. So this gives the ease of navigation
Slower at runtime Faster at runtime

What are the usual application for a DOM parser and for a SAX parser?
In the following cases, using SAX parser is advantageous than using DOM parser.
 The input document is too big for available memory
 You can process the document in small contiguous chunks of input. You do not need the entire document before you can do useful
work
 You just want to use the parser to extract the information of interest, and all your computation will be completely based on the data
structures created by yourself.

In the following cases, using DOM parser is advantageous than using SAX parser.
 Your application needs to access widely separately parts of the document at the same time.
 Your application may probably use an internal data structure which is almost as complicated as the document itself.
 Your application has to modify the document repeatedly.
 Your application has to store the document for a significant amount of time through many method calls.

XML – DOM

The Document Object Model (DOM) is the foundation of XML. XML documents have a hierarchy of informational units called nodes; DOM
is a way of describing those nodes and the relationships between them.
A DOM Document is a collection of nodes or pieces of information organized in a hierarchy. This hierarchy allows a developer to navigate
through the tree looking for specific information. Because it is based on a hierarchy of information, the DOM is said to be tree based.
The XML DOM, on the other hand, also provides an API that allows a developer to add, edit, move, or remove nodes in the tree at any
point in order to create an application.
XML DOM Tree Example

What is the DOM?


The DOM defines a standard for accessing documents like XML and HTML:
"The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically
access and update the content, structure, and style of a document."
The DOM is separated into 3 different parts / levels:
 Core DOM - standard model for any structured document
 XML DOM - standard model for XML documents
 HTML DOM - standard model for HTML documents
The DOM defines the objects and properties of all document elements, and the methods (interface) to access them.

The HTML DOM


The HTML DOM defines a standard way for accessing and manipulating HTML documents.
All HTML elements can be accessed through the HTML DOM.
The HTML DOM defines the objects, properties and methods of all HTML elements.

Change the Value of an HTML Element


This example changes the value of an HTML element with id="demo":
Example
<h1 id="demo">This is a Heading</h1>
<script>
document.getElementById("demo").innerHTML = "Hello World!";
</script>
This example changes the value of the first <h1> element in an HTML document:
Example
<h1>This is a Heading</h1><h1>This is a Heading</h1>
<script>
document.getElementsByTagName("h1")[0].innerHTML = "Hello World!";
</script>
Note: Even if the HTML document containes only ONE <h1> element you still have to specify the array index [0], because the
getElementsByTagName() method always returns an array.
You can learn a lot more about the HTML DOM in our JavaScript tutorial.

The XML DOM


The XML DOM defines a standard way for accessing and manipulating XML documents.
All XML elements can be accessed through the XML DOM.
The XML DOM defines the objects, properties and methods of all XML elements.
The XML DOM is:
 A standard object model for XML
 A standard programming interface for XML
 Platform- and language-independent
 A W3C standard
In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements.

Get the Value of an XML Element


This code retrieves the text value of the first <title> element in an XML document:
Example
txt = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
Programming Interface
The DOM models XML as a set of node objects. The nodes can be accessed with JavaScript or other programming languages. In this
tutorial we use JavaScript.
The programming interface to the DOM is defined by a set standard properties and methods.
Properties are often referred to as something that is (i.e. nodename is "book").
Methods are often referred to as something that is done (i.e. delete "book").

XML DOM Properties


These are some typical DOM properties:
 x.nodeName - the name of x
 x.nodeValue - the value of x
 x.parentNode - the parent node of x
 x.childNodes - the child nodes of x
 x.attributes - the attributes nodes of x
Note: In the list above, x is a node object.

XML DOM Methods


 x.getElementsByTagName(name) - get all elements with a specified tag name
 x.appendChild(node) - insert a child node to x
 x.removeChild(node) - remove a child node from x
XML and XSLT

With XSLT you can transform an XML document into HTML.

Displaying XML with XSLT


XSLT (eXtensible Stylesheet Language Transformations) is the recommended style sheet language for XML.
XSLT is far more sophisticated than CSS. With XSLT you can add/remove elements and attributes to or from the output file. You can also
rearrange and sort elements, perform tests and make decisions about which elements to hide and display, and a lot more.
XSLT uses XPath to find information in an XML document.

XSLT Example
We will use the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>Thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>

Use XSLT to transform XML into HTML, before it is displayed in a browser:


Example XSLT Stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<body style="font-family:Arial;font-size:12pt;background-color:#EEEEEE">
<xsl:for-each select="breakfast_menu/food">
<div style="background-color:teal;color:white;padding:4px">
<span style="font-weight:bold"><xsl:value-of select="name"/> - </span>
<xsl:value-of select="price"/>
</div>
<div style="margin-left:20px;margin-bottom:1em;font-size:10pt">
<p>
<xsl:value-of select="description"/>
<span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving)</span>
</p>
</div>
</xsl:for-each>
</body>
</html>
DTD - Examples from the Net

TV Scedule DTD A Report DTD


<!DOCTYPE TVSCHEDULE [ <!DOCTYPE REPORT [
<!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT REPORT (TITLE,(SECTION|SHORTSECT)+)>
<!ELEMENT CHANNEL (BANNER, DAY+)> <!ELEMENT SECTION (TITLE,%BODY;,SUBSECTION*)>
<!ELEMENT BANNER (#PCDATA)> <!ELEMENT SUBSECTION (TITLE,%BODY;,SUBSECTION*)>
<!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+> <!ELEMENT SHORTSECT (TITLE,%BODY;)>
<!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT TITLE %TEXT;>
<!ELEMENT DATE (#PCDATA)> <!ELEMENT PARA %TEXT;>
<!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)> <!ELEMENT LIST (ITEM)+>
<!ELEMENT TIME (#PCDATA)> <!ELEMENT ITEM (%BLOCK;)>
<!ELEMENT TITLE (#PCDATA)> <!ELEMENT CODE (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)> <!ELEMENT KEYWORD (#PCDATA)>
<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ELEMENT EXAMPLE (TITLE?,%BLOCK;)>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ELEMENT GRAPHIC EMPTY>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST REPORT security (high | medium | low ) "low">
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> <!ATTLIST CODE type CDATA #IMPLIED>
]> <!ATTLIST GRAPHIC file ENTITY #REQUIRED>
<!ENTITY xml "Extensible Markup Language">
<!ENTITY sgml "Standard Generalized Markup Language">
<!ENTITY pxa "Professional XML Authoring">
Product Catalog DTD <!ENTITY % TEXT "(#PCDATA|CODE|KEYWORD|QUOTATION)*">
<!DOCTYPE CATALOG [ <!ENTITY % BLOCK "(PARA|LIST)+">
<!ELEMENT CATALOG (PRODUCT+)> <!ENTITY % BODY "(%BLOCK;|EXAMPLE|NOTE)+">
<!ELEMENT PRODUCT (SPECIFICATIONS+, OPTIONS?, PRICE+, NOTES?)> <!NOTATION GIF SYSTEM "">
<!ELEMENT SPECIFICATIONS (#PCDATA)> <!NOTATION JPG SYSTEM "">
<!ELEMENT OPTIONS (#PCDATA)> <!NOTATION BMP SYSTEM "">
<!ELEMENT PRICE (#PCDATA)> ]>
<!ELEMENT NOTES (#PCDATA)>
<!ATTLIST PRODUCT NAME CDATA #IMPLIED>
<!ATTLIST CATEGORY (HandTool | Table | Shop-Professional) "HandTool"> Newspaper Article DTD
<!ATTLIST PARTNUM CDATA #IMPLIED> <!DOCTYPE NEWSPAPER [
<!ATTLIST PLANT (Pittsburgh | Milwaukee | Chicago) "Chicago"> <!ELEMENT NEWSPAPER (ARTICLE+)>
<!ATTLIST INVENTORY (InStock | Backordered | Discontinued) "InStock"> <!ELEMENT ARTICLE (HEADLINE, BYLINE, LEAD, BODY, NOTES)>
<!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED> <!ELEMENT HEADLINE (#PCDATA)>
<!ATTLIST POWER CDATA #IMPLIED> <!ELEMENT BYLINE (#PCDATA)>
<!ATTLIST OPTIONS FINISH (Metal | Polished | Matte) "Matte"> <!ELEMENT LEAD (#PCDATA)>
<!ATTLIST OPTIONS ADAPTER (Included | Optional | NotApplicable) "Included"> <!ELEMENT BODY (#PCDATA)>
<!ATTLIST OPTIONS CASE (HardShell | Soft | NotApplicable) "HardShell"> <!ELEMENT NOTES (#PCDATA)>
<!ATTLIST PRICE MSRP CDATA #IMPLIED> <!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED>
<!ATTLIST PRICE WHOLESALE CDATA #IMPLIED> <!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>
<!ATTLIST PRICE STREET CDATA #IMPLIED> <!ATTLIST ARTICLE DATE CDATA #IMPLIED>
<!ATTLIST PRICE SHIPPING CDATA #IMPLIED> <!ATTLIST ARTICLE EDITION CDATA #IMPLIED>
<!ENTITY AUTHOR "John Doe">
<!ENTITY COMPANY "JD Power Tools, Inc."> <!ENTITY NEWSPAPER "Vervet Logic Times">
<!ENTITY EMAIL "jd@jd-tools.com"> <!ENTITY PUBLISHER "Vervet Logic Press">
]> <!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">
]>
Introduction to XSL

XSL - The Style Sheet of XML?


HTML pages uses predefined tags, and the meaning of these tags is well understood: <p> means a paragraph and <h1> means a header,
and the browser knows how to display these pages.
With XML we can use any tags we want, and the meaning of these tags are not automatically understood by the browser: <table> could
mean a HTML table or maybe a piece of furniture. Because of the nature of XML, there is no standard way to display an XML document.
In order to display XML documents, it is necessary to have a mechanism to describe how the document should be displayed. One of
these mechanisms is Cascading Style Sheets (CSS), but XSL (eXtensible Stylesheet Language) is the preferred style sheet language of XML,
and XSL is far more sophisticated than the CSS used by HTML.
XSL - More than a Style Sheet
XSL consists of two parts:
 a method for transforming XML documents
 a method for formatting XML documents
If you don't understand the meaning of this, think of XSL as a language that can transform XML into HTML, a language that can filter and
sort XML data and a language that can format XML data, based on the data value, like displaying negative numbers in red.

XSL - What can it do?


XSL can be used to define how an XML file should be displayed by transforming the XML file into a format that is recognizable to a
browser. One such format is HTML. Normally XSL does this by transforming each XML element into an HTML element.
XSL can also add completely new elements into the output file, or remove elements. It can rearrange and sort the elements, test and
make decisions about which elements to display, and a lot more.

XSL – Transformation XML to HTML


What if you want to transform the following XML document (open it with IE5) into HTML?
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
Consider the following XSL document (open it with IE5) as an HTML template to populate a HTML document with XML data:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
In the above file, the xsl:for-each element locates elements in the XML document and repeats a template for each one.
The select attribute describes the element in the source document. The syntax for this attribute is called an XSL Pattern, and works like
navigating a file system where a forward slash (/) selects subdirectories. The xsl:value-of element selects a child in the hierarchy and
inserts the content of that child into the template.
Since an XSL style sheet is an XML file itself, the file begins with an xml declaration. The xsl:stylesheet element indicates that this
document is a style sheet. The template has also been wrapped with xsl:template match="/" to indicate that this is a template that
corresponds to the root (/) of the XML source document.
If you add a reference to the above stylesheet to your original XML document (look at line 2), your browser will nicely transform your
XML document into HTML (open it in IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<?xml-stylesheet type="text/xsl" href="cd_catalog.xsl"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>

XSL - On the Client


A JavaScript Solution
In the previous chapter I explained how XSL can be used to transform a document from XML to HTML. The trick was to add an XSL
stylesheet information to the XML file, and to let the browser do the transformation.
Even if this works fine, it is not always desirable to include a stylesheet reference in the XML file, and the solution will not work in a non
XML aware browser.
A much more versatile solution would be to use a JavaScript to do the XML to HTML transformation.
By using a JavaScript we are more open for these possibilities:
 Allowing the JavaScript to do browser specific testing
 Using different style sheets according to browser and/or user needs
That's the beauty of XSL. One of the design goals for XSL was to make it possible to transform data from one format to another,
supporting different browsers and different user needs.
XSL transformation on the client side is bound to be a major part of the browsers work tasks in the future, as we will se a growth in the
specialized browser marked (think: Braille, Speaking Web, Web Printers, Handheld PCs, Mobile Phones .....).

The XML file and the XSL file


Take a new look at the XML document that you saw in the previous chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
And at the companying XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The syntax of the above XSL document was explained in the previous chapter, so it will not be explained here. But be sure to notice that
the XML file does not have a reference to the XSL file, and the XSL file does not have a reference to the XML file.
IMPORTANT: The above sentence indicates that an XML file could be transformed using many different XSL files.

Transforming XML to HTML on the client


Here is the simple source code needed transform the XML file to HTML on the client (try it yourself):
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>
(The example above uses JavaScript. If you don't know to write JavaScript, you should take a trip to JavaScript School.)
The first block of code creates an instance of the Microsoft XML parser (XMLDOM), and loads the XML document into memory. The
second block of code creates another instance of the parser and loads the XSL document into memory. The last line of code transforms
the XML document using the XSL document, and writes the result to the HTML document.
Nice and simple.

XSL - On the Server


A Cross Browser Solution
In the previous chapter I explained how XSL can be used to transform a document from XML to HTML in the browser. The trick was to let
the JavaScript use an XML parser to do the transformation.
This solution will not work with a browser that doesn’t support an XML parser.
To make our XML data available to all kinds of browsers, we have to transform the XML document on the SERVER and send it as pure
HTML to the BROWSER.
That's another the beauty of XSL. One of the design goals for XSL was to make it possible to transform data from one format to another
on a server, returning readable data to all kinds of future browsers.
XSL transformation on the server is bound to be a major part of the Internet Information Server work tasks in the future, as we will se a
growth in the specialized browser marked (think: Braille, Speaking Web, Web Printers, Handheld PCs, Mobile Phones .....).
The XML file and the XSL file
Take a new look at the XML document that you saw in the previous chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
And at the companying XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The syntax of the above XSL document was explained in the previous chapter, so it will not be explained here. But be sure to notice that
the XML file does not have a reference to the XSL file, and the XSL file does not have a reference to the XML file.
IMPORTANT: The above sentence indicates that an XML file on the server could be transformed using many different XSL files.

Transforming XML to HTML on the Server


Here is the simple source code needed transform the XML file to HTML on the server (View it in your browser):
<%
'Load the XML
set xml = Server.CreateObject("Microsoft.XMLDOM")
xml.async = false
xml.load(Server.MapPath("cd_catalog.xml"))
'Load the XSL
set xsl = Server.CreateObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(Server.MapPath("cd_catalog.xsl"))
Response.Write(xml.transformNode(xsl))
%>
The first block of code creates an instance of the Microsoft XML parser (XMLDOM), and loads the XML file into memory. The second
block of code creates another instance of the parser and loads the XSL document into memory. The last line of code transforms the XML
document using the XSL document, and returns the result to the browser.
Nice and simple.

XSL Sort
Where to put the Sort Information
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To output this XML file as an ordinary HTML file, and sort it at the same time, simply add an order-by attribute to your for-each element
like this:
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
The order-by attributes takes a plus (+) or minus (-) sign, to define an ascending or descending sort order, and an element name to define
the sort element.
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Transforming it on the Client


Here is the simple source code needed transform the XML file to HTML on the client (try it yourself):
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog_sort.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>

XSL Filter Query


Where to put the Filter Information
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To filter the XML file, simply add filter to the select attribute in your for-each element like this:
<xsl:for-each select="CATALOG/CD[ARTIST='Bob Dylan']">
Leagal filter operators are:
 = (equal)
 =! (not equal)
 &LT& less than
 &GT& greater than
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD[ARTIST='Bob Dylan']">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Transforming it on the Client


Here is the simple source code needed transform the XML file to HTML on the client (try it yourself):
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog_filter.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>

XSL Conditional If
Where to put the IF condition
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To put an conditional if test against the content of the file, simply add an xsl:if element to your XSL document like this:
<xsl:if match=".[ARTIST='Bob Dylan']">
... some output ...
</xsl:if>
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<xsl:if match=".[ARTIST='Bob Dylan']">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:if>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Transforming it on the Client


Here is the simple source code needed transform the XML file to HTML on the client (try it yourself):
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog_filter.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>

XSL Conditional Coose


Where to put the Choose Condition
Take a new look at the XML document that you have seen in almost every chapter (or open it with IE5):
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
To insert an conditional choose test against the content of the file, simply add an xsl:choose, xsl:when and xsl:otherwise elements to
your XSL document like this:
<xsl:choose>
<xsl:when match=".[ARTIST='Bob Dylan']">
... some code ...
</xsl:when>
<xsl:otherwise>
... some code ....
</xsl:otherwise>
</xsl:choose>
Now take a look at your slightly adjusted XSL stylesheet (or open it with IE5):
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<xsl:choose>
<xsl:when match=".[ARTIST='Bob Dylan']">
<td bgcolor="#ff0000"><xsl:value-of select="ARTIST"/></td>
</xsl:when>
<xsl:otherwise>
<td><xsl:value-of select="ARTIST"/></td>
</xsl:otherwise>
</xsl:choose>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Transforming it on the Client


Here is the simple source code needed transform the XML file to HTML on the client
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog_filter.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy