Introduction To XML Extensible Markup Language: Prof.N.Nalini AP (SR) VIT

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

Introduction to XML

Extensible Markup Language


Prof.N.Nalini
AP(Sr)
VIT
Structured vs. unstructured data
• Relational databases are highly structured
• All data resides in tables
• You must define schema before entering any data
• Every row confirms to the table schema
• Changing the schema is hard and may break many things
• Texts are highly unstructured
• Data is free-form
• There is pre-defined schema, and it’s hard to define one
• Readers need to infer structures and meanings
Semi-structured data
• Observation: most data have some structure, e.g.:
• Book: chapters, sections, titles, paragraphs, references,
index, etc.
• Item for sale: name, picture, price (range), ratings,
promotions, etc.
• Web page: HTML
• Ideas:
• Ensure data is “well-formatted”
• If needed, ensure data is also “well-structured”
• But make it easy to define and extend this structure
• Make data “self-describing”
HTML: language of the Web
XML

• Text-based
• Capture data (content), not presentation
• Data self-describes its structure
• Names and nesting of tags have meanings!
Features of XML
• Portability: Just like HTML, you can ship XML data
across platforms
• Relational data requires heavy-weight API’s
• Flexibility: You can represent any information
(structured, semi-structured, documents, …)
• Relational data is best suited for structured data
• Extensibility: Since data describes itself, you can
change the schema easily
• Relational schema is rigid and difficult to change
Well-formed XML documents

• XML is case sensitive.


• Attributes should be properly coated within single/double quotes and its
name must not appear more than once in the same start-tag(unique in the
tag)
sample
XML Trees

• An XML document has a single root node.


• The tree is a general ordered tree.
– A parent node may have any number of
children.
– Child nodes are ordered, and may have
siblings.
• Preorder traversals are usually used for
getting information out of the tree.
Advantages of XML

• XML is text (Unicode) based.


– Takes up less space.
– Can be transmitted efficiently.
• One XML document can be displayed differently
in different media.
– Html, video, CD, DVD,
– You only have to change the XML document in order
to change all the rest.
• XML documents can be modularized. Parts can
be reused.
Valid XML Documents
• A well-formed document has a tree structure and
obeys all the XML rules.
• A particular application may add more rules in
either a DTD (document type definition) or in a
schema.
• Many specialized DTDs and schemas have
been created to describe particular areas.
• DTDs were developed first, so they are not as
comprehensive as schema.
– DTD--- Structure
– Schema---Structure and Content
Validation
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4-ppp</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>

Rules that indicate


the valid structure Validator
of book data

Error!!! Invalid ISBN!


Document Type Definitions
• A DTD describes the tree structure of a document and something about its
data.
• A DTD is optional
• A DTD specifies a grammar for the document
• Constraints on structures and values of elements, attributes,
etc.
• There are two data types, PCDATA and CDATA.
–PCDATA is parsed character data.
–CDATA is character data, not usually parsed.
• A DTD determines how many times a node may appear, and how child
nodes are ordered.
• Child elements can have modifiers, +, *, ?
<!ELEMENT person
(ID+, age, Lastname?, sibling*)>
DTD for address Example
<!ELEMENT address (name, email, phone, birthday)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
• note.dtd file
Schemas
• Schemas are themselves XML documents.
• They were standardized after DTDs and provide more
information about the document.
• They have a number of data types including string,
decimal, integer, boolean, date, and time.
• Data Type Categories
1. Simple (strings only, no attributes and no
nested elements)
2. Complex (can have attributes and nested
elements)
• They also determine the tree structure and how many
children a node may have.
XML Schema definition (XSD)
Example Schema
EXAMPLE:1
XML with xsd
EXAMPLE:1
Schema file
EXAMPLE: 2
XML with xsd
EXAMPLE: 2
xsd file
Restrictions
Transformations: XSL
• Language for expressing document styles
• Specifies the presentation of XML
– More powerful than CSS
• Consists of:
– XSLT
– XPath
– XSL Formatting Objects (XSL-FO)
Transforming the Data

XML

Transformation
Transformation
Instructions
Tool

HTML, XML, Text

XSLT – a language used to transform XML


data into a different form (commonly XML
or HTML)
XSLT
Extensible Stylesheet Language Transformations

• XSLT is used to transform one xml document


into another, often an html document.
• A program is used that takes as input one xml
document and produces as output another.
• If the resulting document is in html, it can be
viewed by a web browser.
• This is a good way to display xml data.
A Style Sheet to Transform address.xml

<?xml version="1.0" encoding="ISO-8859-1"?>


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="address">
<html><head><title>Address Book</title></head>
<body>
<xsl:value-of select="name"/>
<br/><xsl:value-of select="email"/>
<br/><xsl:value-of select="phone"/>
<br/><xsl:value-of select="birthday"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The Result of the Transformation

Alice Lee
alee@aol.com
123-45-6789
1983-7-15
Parsers

• There are two principal models for


parsers.
• SAX – Simple API for XML
– Uses a call-back method
– Similar to javax listeners
• DOM – Document Object Model
– Creates a parse tree
– Requires a tree traversal
References

• Elliotte Rusty Harold, Processing XML with


Java, Addison Wesley, 2002.
• Elliotte Rusty Harold and Scott Means,
XML Programming, O’Reilly & Associates,
Inc., 2002.
• W3Schools Online Web Tutorials,
http://www.w3schools.com.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy