Unit Iv .: Web and Internet Technologies (15A05605) III B.Tech II Sem (CSE)
Unit Iv .: Web and Internet Technologies (15A05605) III B.Tech II Sem (CSE)
UNIT IV
. Creating and Using PHP Forms: Understanding Common Form Issues, GET vs. POST,
Validating form input, Working with multiple forms, and Preventing Multiple Submissions of a
form.
XML: Basic XML- Document Type Definition XML Schema DOM and Presenting XML, XML
Parsers and Validation, XSL and XSLT Transformation, News Feed (RSS and ATOM).
1. SQL Vulnerabilities
SQL injection is the most commonly reported security issue. It is mainly associated with
those Web sites containing large code bases written a long time ago when developers were not so
much security aware.
Through this kind of attacks, hackers may get access to databases associated with the
PHP web sites. They may insert malicious code and modify or even delete your database. This
kind of problem usually arises due to data validation and escaping loopholes left by PHP
developers.
Examples
$query = "SELECT * FROM students WHERE empname='David'";
The bbove query can be exploited as:
$query = "SELECT * FROM students WHERE empname='' or '1'";
The above query will return true and hence all the data from table students is returned.
An attacker may alter the databases and the Web site may get crashed as the attackers gain
administrative privileges.
Prevention
Before being processed by the application, the data should be validated. Invalid data
should not be processed at all. Possibly valid data should be escaped before passing it to the
database as query parameters. If possible use database extensions that support prepared
queries like MySQLi or PDO.
Passwords must be hashed using the password_hash() function.
Technical details should be removed from error messages displayed to the users because smart
hackers may get into the system using these details, like database names, user names and table
names.
An attacker specifically looks at error messages to get information such as database
names, user names and table name, hence, you should disable error messages or you can create
your own custom error messages.
You can also limit permissions of your application database user to make your database
more secure. You can limit users access to database tables and views by using stored
procedures and previously defined cursors. You can limit the privileges of the database user by
preventing the use of keywords like drop, union, update and insert which can allow malicious
modification of database.
2. Buffer Overflows
Usually, a buffer overflow problem is not caused directly by the code of interpreted
languages like PHP. However the PHP engine is written in C. So buffer overflows may occur in
PHP due to bugs in the C implementation of the PHP engine. Hence, it can be said that PHP
applications are secure from overflows but the PHP engine itself is not.
PHP code does not allocate memory directly. It is the C code of the PHP engine that
allocates and frees the necessary memory. A buffer overflow occurs in C code of the PHP engine
that writes to memory beyond the boundaries of memory that was allocated.
Buffer overflows may cause the PHP engine to execute arbitrary code that can perform security
exploits.
Since it happens at the level of the C code of the PHP engine, you cannot determine
whether your PHP code may trigger buffer overflow vulnerabilities just looking at your PHP code.
You can however use PHP extensions like Suhosin(Suhosin (pronounced 'su-ho-shin') is an
advanced protection system for PHP 5 installations. It is designed to protect servers and users
from known and unknown flaws in PHP applications and the PHP core.) that can alter the way
PHP memory is allocated to detect many cases of buffer overflow occurrences and stop executing
the PHP engine to avoid possible exploits.
3. XSS Exploits
The most usual form of Web site hacking is cross site scripting (XSS). Using this
vulnerability, hackers force a site to perform certain actions. What hackers do is basically to
inject a client side scripting code (JavaScript) mixed with submitted content, so that when a user
visits a Web page with the submitted content, the malicious script gets downloaded
automatically in his web browser and gets executed.
In this process, the malicious code usually gets saved in the database as if it was
legitimate content. When a user opens the Web page, cookies and session identifiers may be
stolen and sent to a third party site of the attacker. As a result of XSS flaws, the user may get
redirected to a spammy Web site for instance.
XSS may also be used for user account hacking. When the attacker is able to steal the
PHP session cookie value, he may be able to access to the user account as if it was the real user.
Prevention of XSS Exploits
XSS vulnerabilities can be avoided by properly encoding HTML using entities for <, >, "
and '. Escaping of HTML characters on online forums can also be avoided by using bbcodes
usually offered there.
The htmlpecialchars() function can be helpful in this regard as it converts content
automatically into HTML entities. It also converts single quotes by using ENT_QUOTES as
second argument. The strip_tags() function also removes PHP and HTML tags from string.
A session basically consists of time interval of interaction between the Web application
and users which might be authenticated for making it more secure. Using PHP sessions, by
default, the Web site stores in a file the user's session data on the server and sends the session
identifier to the browser as a cookie. The attacker may try to obtain user's session ID which is
created the session is started for the first time for a given user accessing the site.
Prevention:
You can use the session_regenerate_id() function to change session IDs frequently. So if
the user session identifier is stolen by somebody that intercepts the connection between the user
browser and the server, that identifier will be invalid next time the user accesses again.
Revalidations of the user sensitive information like password can minimize the risk of
hacking. Such applications that handle sensitive information like debit and credit cards must be
secured by using SSL so that session and cookie hacking can be avoided. Login or password
change pages should also be accessible only via SSL.
Furthermore, avoid session identifiers and other cookies to be stolen using malicious
JavaScript inject in the Web pages, for instance with cross-site scripting attacks, you can use
HTTP-only cookies. These are cookies that the browser stores in on its side but JavaScript code
does not have access to these cookies.
Text Fields
The name, email, and website fields are text input elements, and the comment field is a textarea.
The HTML code looks like this:
Radio Buttons
The gender fields are radio buttons and the HTML code looks like this:
Gender:
<input type="radio" name="gender" value="female">Female
<input type="radio" name="gender" value="male">Male
<input type="radio" name="gender" value="other">Other
http://www.example.com/test_form.php/%22%3E%3Cscript%3Ealert('hacked')%3C/script%3E
In this case, the above code will be translated to:
<form method="post" action="test_form.php/"><script>alert('hacked')</script>
This code adds a script tag and an alert command. And when the page loads, the JavaScript
code will be executed (the user will see an alert box). This is just a simple and harmless example
how the PHP_SELF variable can be exploited.
Be aware of that any JavaScript code can be added inside the <script> tag! A hacker can
redirect the user to a file on another server, and that file can hold malicious code that can alter
the global variables or submit the form to another address to save the user data, for example.
Example
<?php
// define variables and set to empty values
$name = $email = $gender = $comment = $website = "";
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$name = test_input($_POST["name"]);
$email = test_input($_POST["email"]);
$website = test_input($_POST["website"]);
$comment = test_input($_POST["comment"]);
$gender = test_input($_POST["gender"]);
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
?>
Your Input:
raju
raju_foru@gmail.com
raju-edu.com
Hello PHP Form
male
Notice that at the start of the script, we check whether the form has been submitted using
$_SERVER["REQUEST_METHOD"]. If the REQUEST_METHOD is POST, then the form has been
submitted - and it should be validated. If it has not been submitted, skip the validation and
display a blank form.
However, in the example above, all input fields are optional. The script works fine even if the
user does not enter any data.
The next step is to make input fields required and create error messages if needed.
In the following code we have added some new variables: $nameErr, $emailErr,
$genderErr, and $websiteErr. These error variables will hold error messages for the required
fields. We have also added an if else statement for each $_POST variable. This checks if the
$_POST variable is empty (with the PHP empty() function). If it is empty, an error message is
stored in the different error variables, and if it is not empty, it sends the user input data through
the test_input() function:
<?php
// define variables and set to empty values
$nameErr = $emailErr = $genderErr = $websiteErr = " ";
$name = $email = $gender = $comment = $website = " ";
if ($_SERVER["REQUEST_METHOD"] == "POST") {
if (empty($_POST["name"])) {
$nameErr = "Name is required";
} else {
$name = test_input($_POST["name"]);
}
if (empty($_POST["email"])) {
$emailErr = "Email is required";
} else {
$email = test_input($_POST["email"]);
}
if (empty($_POST["website"])) {
$website = "";
} else {
$website = test_input($_POST["website"]);
}
if (empty($_POST["comment"])) {
$comment = "";
} else {
$comment = test_input($_POST["comment"]);
}
if (empty($_POST["gender"])) {
$genderErr = "Gender is required";
} else {
$gender = test_input($_POST["gender"]);
}
}
?>
</form>
The next step is to validate the input data, that is "Does the Name field contain only
letters and whitespace?", and "Does the E-mail field contain a valid e-mail address syntax?", and
if filled out, "Does the Website field contain a valid URL?".
The preg_match() function searches a string for pattern, returning true if the pattern exists, and
false otherwise.
$email = test_input($_POST["email"]);
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
$emailErr = "Invalid email format";
}
<?php
// define variables and set to empty values
$nameErr = $emailErr = $genderErr = $websiteErr = "";
$name = $email = $gender = $comment = $website = "";
if ($_SERVER["REQUEST_METHOD"] == "POST") {
if (empty($_POST["name"])) {
$nameErr = "Name is required";
} else {
$name = test_input($_POST["name"]);
// check if name only contains letters and whitespace
if (!preg_match("/^[a-zA-Z ]*$/",$name)) {
$nameErr = "Only letters and white space allowed";
}
}
if (empty($_POST["email"])) {
$emailErr = "Email is required";
} else {
$email = test_input($_POST["email"]);
// check if e-mail address is well-formed
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
$emailErr = "Invalid email format";
}
}
if (empty($_POST["website"])) {
$website = "";
} else {
$website = test_input($_POST["website"]);
// check if URL address syntax is valid (this regular expression also allows dashes in the
URL)
if (!preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-
9+&@#\/%=~_|]/i",$website)) {
$websiteErr = "Invalid URL";
}
}
if (empty($_POST["comment"])) {
$comment = "";
} else {
$comment = test_input($_POST["comment"]);
}
if (empty($_POST["gender"])) {
$genderErr = "Gender is required";
} else {
$gender = test_input($_POST["gender"]);
}}
?>
The next step is to show how to prevent the form from emptying all the input fields when the
user submits the form.
Gender:
<input type="radio" name="gender"
<?php if (isset($gender) && $gender=="female") echo "checked";?>
value="female">Female
<input type="radio" name="gender"
<?php if (isset($gender) && $gender=="male") echo "checked";?>
value="male">Male
<input type="radio" name="gender"
<?php if (isset($gender) && $gender=="other") echo "checked";?>
value="other">Other
<!DOCTYPE HTML>
<html>
<head>
<style>
.error {color: #FF0000;}
</style>
</head>
<body>
<?php
// define variables and set to empty values
$nameErr = $emailErr = $genderErr = $websiteErr = "";
if ($_SERVER["REQUEST_METHOD"] == "POST") {
if (empty($_POST["name"])) {
$nameErr = "Name is required";
} else {
$name = test_input($_POST["name"]);
// check if name only contains letters and whitespace
if (!preg_match("/^[a-zA-Z ]*$/",$name)) {
$nameErr = "Only letters and white space allowed";
}
}
if (empty($_POST["email"])) {
$emailErr = "Email is required";
} else {
$email = test_input($_POST["email"]);
// check if e-mail address is well-formed
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
$emailErr = "Invalid email format";
}
}
if (empty($_POST["website"])) {
$website = "";
} else {
$website = test_input($_POST["website"]);
// check if URL address syntax is valid (this regular expression also allows dashes in the
URL)
if (!preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-
9+&@#\/%=~_|]/i",$website)) {
$websiteErr = "Invalid URL";
}
}
if (empty($_POST["comment"])) {
$comment = "";
} else {
$comment = test_input($_POST["comment"]);
}
if (empty($_POST["gender"])) {
$genderErr = "Gender is required";
} else {
$gender = test_input($_POST["gender"]);
}
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
?>
<?php
echo "<h2>Your Input:</h2>";
echo $name;
echo "<br>";
echo $email;
echo "<br>";
echo $website;
echo "<br>";
echo $comment;
echo "<br>";
echo $gender;
?>
</body>
</html>
A multi page form in PHP can be created using sessions, that are used to retain values of a form
and can transfer them from one page to another.
By seeing popularity of such forms, to create a multi page form using PHP script. In this
example, we have used:
PHP sessions to store page wise form field values in three steps.
Also, we have applied some validations on each page.
At the end, we collect values from all forms and store them in a database.
<?php
session_start(); // Session starts here.
?><!DOCTYPE HTML>
<html>
<head>
<title>PHP Multi Page Form</title>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<div class="main">
<h2>PHP Multi Page Form</h2>
<span id="error">
<!---- Initializing Session for errors --->
<?php
if (!empty($_SESSION['error'])) {
echo $_SESSION['error'];
unset($_SESSION['error']);
}
?>
</span>
<form action="page2_form.php" method="post">
<label>Full Name :<span>*</span></label>
<input name="name" type="text" placeholder="Ex-James Anderson" required>
<label>Email :<span>*</span></label>
<input name="email" type="email" placeholder="Ex-anderson@gmail.com" required>
<label>Contact :<span>*</span></label>
<input name="contact" type="text" placeholder="10-digit number" required>
<label>Password :<span>*</span></label>
<input name="password" type="Password" placeholder="*****" />
<label>Re-enter Password :<span>*</span></label>
<input name="confirm" type="password" placeholder="*****" >
<input type="reset" value="Reset" />
<input type="submit" value="Next" />
</form>
</div>
</div>
</body>
</html>
<?php
session_start();
// Checking first page values for empty,If it finds any blank field then redirected to first page.
if (isset($_POST['name'])){
if (empty($_POST['name'])
|| empty($_POST['email'])
|| empty($_POST['contact'])
|| empty($_POST['password'])
|| empty($_POST['confirm'])){
// Setting error message
$_SESSION['error'] = "Mandatory field(s) are missing, Please fill it again";
header("location: page1_form.php"); // Redirecting to first page
} else {
// Sanitizing email field to remove unwanted characters.
$_POST['email'] = filter_var($_POST['email'], FILTER_SANITIZE_EMAIL);
// After sanitization Validation is performed.
if (filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)){
// Validating Contact Field using regex.
if (!preg_match("/^[0-9]{10}$/", $_POST['contact'])){
$_SESSION['error'] = "10 digit contact number is required.";
header("location: page1_form.php");
} else {
if (($_POST['password']) === ($_POST['confirm'])) {
foreach ($_POST as $key => $value) {
$_SESSION['post'][$key] = $value;
}
} else {
$_SESSION['error'] = "Password does not match with Confirm Password.";
<?php
session_start();
// Checking second page values for empty, If it finds any blank field then redirected to second
page.
if (isset($_POST['gender'])){
if (empty($_POST['gender'])
|| empty($_POST['nationality'])
|| empty($_POST['religion'])
|| empty($_POST['qualification'])
|| empty($_POST['experience'])){
$_SESSION['error_page2'] = "Mandatory field(s) are missing, Please fill it again"; // Setting error
message.
header("location: page2_form.php"); // Redirecting to second page.
} else {
// Fetching all values posted from second page and storing it in variable.
foreach ($_POST as $key => $value) {
$_SESSION['post'][$key] = $value;
}
}
} else {
if (empty($_SESSION['error_page3'])) {
header("location: page1_form.php");// Redirecting to first page.
}
}
?>
<!DOCTYPE HTML>
<html>
<head>
<title>PHP Multi Page Form</title>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<div class="main">
<h2>PHP Multi Page Form</h2><hr/>
<span id="error">
<?php
if (!empty($_SESSION['error_page3'])) {
echo $_SESSION['error_page3'];
unset($_SESSION['error_page3']);
}
?>
</span>
<form action="page4_insertdata.php" method="post">
<!DOCTYPE HTML>
<html>
<head>
<title>PHP Multi Page Form</title>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<div class="main">
<h2>PHP Multi Page Form</h2>
<?php
session_start();
if (isset($_POST['state'])) {
if (!empty($_SESSION['post'])){
if (empty($_POST['address1'])
|| empty($_POST['city'])
|| empty($_POST['pin'])
|| empty($_POST['state'])){
// Setting error for page 3.
$_SESSION['error_page3'] = "Mandatory field(s) are missing, Please fill it again";
header("location: page3_form.php"); // Redirecting to third page.
} else {
foreach ($_POST as $key => $value) {
$_SESSION['post'][$key] = $value;
}
extract($_SESSION['post']); // Function to extract array.
$connection = mysql_connect("localhost", "root", "");
$db = mysql_select_db("phpmultipage", $connection); // Storing values in database.
$query = mysql_query("insert into detail
(name,email,contact,password,religion,nationality,gender,qualification,experience,address1,addr
ess2,city,pin,state)
values('$name','$email','$contact','$password','$religion','$nationality','$gender','$qualification','$
experience','$address1','$address2','$city','$pin','$state')", $connection);
if ($query) {
echo '<p><span id="success">Form Submitted successfully..!!</span></p>';
} else {
echo '<p><span>Form Submission Failed..!!</span></p>';
}
unset($_SESSION['post']); // Destroying session.
}
} else {
header("location: page1_form.php"); // Redirecting to first page.
}
} else {
header("location: page1_form.php"); // Redirecting to first page.
}
?>
</div>
</div>
</body>
</html>
MySQL Codes:
To create table in MySQL Database.
CREATE TABLE detail (
user_id int(10) NOT NULL AUTO_INCREMENT,
name varchar(255) NOT NULL,
email varchar(255) NOT NULL,
contact int(15) NOT NULL,
password varchar(255) NOT NULL,
religion varchar(255) NOT NULL,
nationality varchar(255) NOT NULL,
gender varchar(255) NOT NULL,
qualification varchar(255) NOT NULL,
experience varchar(255) NOT NULL,
address1 varchar(255) NOT NULL,
address2 varchar(255) NOT NULL,
city varchar(255) NOT NULL,
pin int(10) NOT NULL,
state varchar(255) NOT NULL,
PRIMARY KEY (user_id)
)
height: 35px;
font-size: 16px;
font-family: cursive;
}
input[type=submit],
input[type=reset]{
padding: 10px;
background: linear-gradient(#ffbc00 5%, #ffdd7f 100%);
border: 1px solid #e5a900;
color: #524f49;
cursor: pointer;
width: 49.2%;
border-radius: 2px;
margin-bottom: 15px;
font-weight:bold;
font-size:16px;
}
input[type=submit]:hover,
input[type=reset]:hover
{
background: linear-gradient(#ffdd7f 5%, #ffbc00 100%);
}
XML:Basics:
XML is a software- and hardware-independent tool for storing and transporting data.
What is XML?
XML stands for eXtensible Markup Language
XML is a markup language much like HTML
XML was designed to store and transport data
XML was designed to be self-descriptive
XML is a W3C Recommendation
XML Does Not Use Predefined Tags
XML Separates Data from Presentation
XML is Often a Complement to HTML
XML Separates Data from HTML
XML Tags are Case Sensitive
XML Elements Must be Properly Nested
XML Attribute Values Must Always be Quoted
XML having Entity References(<,> &' ")
Comments in XML(<!-- This is a comment -->)
XML Stores New Line as LF
Books.xml
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
</bookstore>
Example:
<college>AITS<college> <!—valid xml element -->
<DEPARTMENT>cse</department><!—in valid xml element -->
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
XML Attributes:
XML elements can have attributes, just like HTML.
Attributes are designed to contain data related to a specific element.
Attribute values must always be quoted. Either single or double quotes can be used.
For a person's gender, the <person> element can be written like this:
<person gender="female"> or like this: <person gender='female'>
XML – DTDs:
The XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML documents
against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a separate
document and then liked separately.
Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
The DTD starts with <!DOCTYPE delimiter.
An element tells the parser to parse the document from the specified root element.
DTD identifier is an identifier for the document type definition, which may be the path to
a file on the system or URL to a file on the internet. If the DTD is pointing to external
path, it is called External Subset.
The square brackets [ ] enclose an optional list of entity declarations called Internal
Subset.
Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it
as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of an external source.
Syntax
Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you declare the
elements.
Example
Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>CSE</name>
<company>AITS-TPT</company>
<phone>(011) 123-4567</phone>
</address>
Let us go through the above code −
Start Declaration − Begin the XML declaration with the following statement.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
DTD − Immediately after the XML header, the document type declarationfollows, commonly
referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations.
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document.
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here
#PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a closing bracket
and a closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML
document follows immediately.
Rules
The document type declaration must appear at the start of the document (preceded only
by the XML header) − it is not permitted anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
The Name in the document type declaration must match the element type of the root
element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying the
system attributes which may be either the legal .dtd file or a valid URL. To refer it as external
DTD, standalone attribute in the XML declaration must be set as no. This means, declaration
includes information from the external source.
Syntax
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
Example
The following example shows external DTD usage −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>CSE</name>
<company>AITS-TPT</company>
<phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd is as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
XML – Schemas:
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and
validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
Syntax
You need to declare a schema in your XML document as follows −
Example
The following example shows how to use schema −
<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema">
<xs:element name = "contact">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.
Elements:
elements are the building blocks of XML document. An element can be defined within an XSD
as follows −
<xs:element name = "x" type = "y"/>
Definition Types
You can define XML schema elements in the following ways −
Simple Type
Simple type element is used only in the context of the text. Some of the predefined simple types
are: xs:integer, xs:boolean, xs:string, xs:date. For example −
<xs:element name = "phone_number" type = "xs:int" />
Complex Type
A complex type is a container for other element definitions. This allows you to specify which
child elements an element can contain and to provide some structure within your XML
documents. For example −
<xs:element name = "Address">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, Address element consists of child elements. This is a container for
other <xs:element> definitions, that allows to build a simple hierarchy of elements in the XML
document.
Global Types
With the global type, you can define a single type in your document, which can be used by all
other references. For example, suppose you want to generalize the person and company for
different addresses of the company. In such case, you can define a general type as follows −
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as follows −
<xs:element name = "Address1">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone1" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above diagram, there is a root element named as <company>. Inside that, there is one
more element <Employee>. Inside the employee element, there are five branches named
<FirstName>, <LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address>
element, there are three sub-branches, named <City> <State> and <Zip>.
XML – DOM:
The Document Object Model (DOM) is the foundation of XML. XML documents have a
hierarchy of informational units called nodes; DOM is a way of describing those nodes and the
relationships between them.
XML – Namespaces:
A Namespace is a set of unique names. Namespace is a mechanisms by which element and
attribute name can be assigned to a group. The Namespace is identified by URI(Uniform
Resource Identifiers).
Namespace Declaration
A Namespace is declared using reserved attributes. Such an attribute name must either
be xmlns or begin with xmlns: shown as below −
<element xmlns:name = "URL">
Syntax
The Namespace starts with the keyword xmlns.
The word name is the Namespace prefix.
The URL is the Namespace identifier.
Example
Namespace affects only a limited area in the document. An element containing the declaration
and all of its descendants are in the scope of the Namespace. Following is a simple example of
XML Namespace −
<?xml version = "1.0" encoding = "UTF-8"?>
<cont:contact xmlns:cont = "www.aits.tpt.edu.in/profile">
<cont:name>CSE</cont:name>
<cont:company>AITS-TPTt</cont:company>
<cont:phone>(011) 123-4567</cont:phone>
</cont:contact>
Here, the Namespace prefix is cont, and the Namespace identifier (URI)
as www.aits.tpt.edu.in/profile. This means, the element names and attribute names with
the cont prefix (including the contact element), all belong to
the www.aits.tpt.edu.in/profile namespace.
XML – Parsers:
XML parser is a software library or a package that provides interface for client applications to
work with XML documents. It checks for proper format of the XML document and may also
validate the XML documents. Modern day browsers have built-in XML parsers.
Following diagram shows how XML parser interacts with XML document −
XML – Processors:
When a software program reads an XML document and takes actions accordingly, this is
called processing the XML. Any program that can read and process XML documents is known as
Annamacharya Inst. of Technology & Sciences :: Tirupati Page | 28
WEB AND INTERNET TECHNOLOGIES (15A05605) III B.Tech II Sem (CSE)
an XML processor. An XML processor reads the XML file and turns it into in-memory structures
that the rest of the program can access.
The most fundamental XML processor reads an XML document and converts it into an internal
representation for other programs or subroutines to use. This is called a parser, and it is an
important component of every XML processing program.
Processor involves processing the instructions, that can be as follows:
Processing Instructions (PIs):
"Processing instructions (PIs) allow documents to contain instructions for applications. PIs are
not part of the character data of the document, but MUST be passed through to the application.
Processing instructions (PIs) can be used to pass information to applications. PIs can appear
anywhere in the document outside the markup. They can appear in the prolog, including the
document type definition (DTD), in textual content, or after the document.
Syntax
Following is the syntax of PI −
<?target instructions?>
Where
target − Identifies the application to which the instruction is directed.
instruction − A character that describes the information for the application to process.
A PI starts with a special tag <? and ends with ?>. Processing of the contents ends immediately
after the string ?> is encountered.
Example
PIs are rarely used. They are mostly used to link XML document to a style sheet. Following is an
example −
<?xml-stylesheet href = "AITS-TPTtstyle.css" type = "text/css"?>
Here, the target is xml-stylesheet. href="AITS-
TPTtstyle.css" and type="text/css" are data or instructions the target application will use at the
time of processing the given XML document.
In this case, a browser recognizes the target by indicating that the XML should be transformed
before being shown; the first attribute states that the type of the transform is XSL and the
second attribute points to its location.
Processing Instructions Rules
A PI can contain any data except the combination ?>, which is interpreted as the closing
delimiter. Here are two examples of valid PIs −
<?welcome to pg = 10 of tutorials point?>
<?welcome?>
Types
XML processors are classified as validating or non-validating types, depending on whether or
not they check XML documents for validity. A processor that discovers a validity error must be
able to report it, but may continue with normal processing.
A few validating parsers are − xml4c (IBM, in C++), xml4j (IBM, in Java), MSXML (Microsoft, in
Java), TclXML (TCL), xmlproc (Python), XML::Parser (Perl), Java Project X (Sun, in Java).
A few non-validating parsers are − OpenXML (Java), Lark (Java), xp (Java), AElfred (Java),
expat (C), XParse (JavaScript), xmllib (Python).
XML – Validation:
Validation is a process by which an XML document is validated. An XML document is said to be
valid if its contents match with the elements, attributes and associated document type
declaration(DTD), and if the document complies with the constraints expressed in it. Validation
is dealt in two ways by the XML parser. They are −
Well-formed XML document
Valid XML document
Well-formed XML Document
An XML document is said to be well-formed if it adheres to the following rules −
Non DTD XML files must use the predefined character entities for amp(&), apos(single
quote), gt(>), lt(<), quot(double quote).
It must follow the ordering of the tag. i.e., the inner tag must be closed before closing the
outer tag.
Each of its opening tags must have a closing tag or it must be a self ending
tag.(<title>....</title> or <title/>).
It must have only one attribute in a start tag, which needs to be quoted.
amp(&), apos(single quote), gt(>), lt(<), quot(double quote)entities other than these
must be declared.
Example
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>CSE</name>
<company>AITS-TPTt</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
It defines the type of document. Here, the document type is elementtype.
It includes a root element named as address.
Each of the child elements among name, company and phone is enclosed in its self
explanatory tag.
Order of the tags is maintained.
Valid XML Document
If an XML document is well-formed and has an associated Document Type Declaration (DTD),
then it is said to be a valid XML document.
XSLanguages:
XSLT is a language for transforming XML documents.
XPath is a language for navigating in XML documents.
XQuery is a language for querying XML documents.
It Started with XSL
XSL stands for EXtensible Stylesheet Language.
The World Wide Web Consortium (W3C) started to develop XSL because there was a need for an
XML-based Stylesheet Language.
XSLT – Transformation:
Correct Style Sheet Declaration
The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or
<xsl:transform>.
Note: <xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be used!
The correct way to declare an XSL style sheet according to the W3C XSLT Recommendation is:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
or:
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
To get access to the XSLT elements, attributes and features we must declare the XSLT
namespace at the top of the document.
The xmlns:xsl="http://www.w3.org/1999/XSL/Transform" points to the official W3C XSLT
namespace. If you use this namespace, you must also include the attribute version="1.0".
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
View "cdcatalog.xsl"
Example Explained
Since an XSL style sheet is an XML document, it always begins with the XML declaration: <?xml
version="1.0" encoding="UTF-8"?>.
The next element, <xsl:stylesheet>, defines that this document is an XSLT style sheet document
(along with the version number and XSLT namespace attributes).
The <xsl:template> element defines a template. The match="/" attribute associates the template
with the root of the XML source document.
The content inside the <xsl:template> element defines some HTML to write to the output.
The last two lines define the end of the template and the end of the style sheet.
The result from this example was a little disappointing, because no data was copied from the
XML document to the output. In the next chapter you will learn how to use the <xsl:value-
of> element to select values from the XML elements.
What is Atom?
Atom is the name of an XML-based Web content and metadata syndication format, and an
application-level protocol for publishing and editing Web resources belonging to periodically
updated websites.
All Atom feeds must be well-formed XML documents, and are identified with
the application/atom+xml media type.
Atom is a relatively recent spec and is much more robust and feature-rich than RSS. For
instance, where RSS requires descriptive fields such as title and link only in item breakdowns,
Atom requires these things for both items and the full Feed.
General considerations:
All elements described in this document must be in
the http://www.w3.org/2005/Atom namespace.
All timestamps in Atom must conform to RFC 3339.
Unless otherwise specified, all values must be plain text (i.e., no entity-encoded html).
xml:lang may be used to identify the language of any human readable text.
xml:base may be used to control how relative URIs are resolved.
Sample feed
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Example Feed</title>
<link href="http://example.org/"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>John Doe</name>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
<entry>
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>
</feed>
What is RSS?
RSS is an open method for delivering regularly changing web content. Many news-related sites,
weblogs, and other online publishers syndicate their content as an RSS Feed to whoever wants
it.
Any time you want to retrieve the latest headlines from your favorite sites, you can access the
available RSS Feeds via a desktop RSS reader. You can also make an RSS Feed for your own site
if your content changes frequently.
In brief:
RSS is a protocol that provides an open method of syndicating and aggregating web
content.
RSS is a standard for publishing regular updates to web-based content.
RSS is a Syndication Standard based on a type of XML file that resides on an Internet
server.
RSS is an XML application, which conforms to the W3C's RDF specification and is
extensible via XML.
You can also download RSS Feeds from other sites to display the updated news items on
your site, or use a desktop or online reader to access your favorite RSS Feeds.
What does RSS stand for? It depends on what version of RSS you are using.
RSS Version 0.9 - Rich Site Summary
RSS Version 1.0 - RDF Site Summary
RSS Versions 2.0, 2.0.1, and 0.9x - Really Simple Syndication
What is RSS Feed?
RSS Feed is a text XML file that resides on an Internet server.
An RSS Feed file includes the basic information about a site (title, URL, description), plus
one or more item entries that include - at a minimum - a title (headline), a URL, and a
brief description of the linked content.
There are various flavors of RSS Feed depending on RSS Version. Another XML Feed
format is called ATOM.
RSS Feeds are registered with an RSS registry to make them more available to viewers
interested in your content area.
RSS Feeds can have links back to your website, which will result in a high traffic to your
site.
RSS Feeds are updated hourly (Associated Press and News Groups), some RSS Feeds are
updated daily, and others are updated weekly or irregularly.
How Does RSS Work?:
This is how RSS works:
A website willing to publish its content using RSS creates one RSS Feed and keeps it on a
web server. RSS Feeds can be created manually or with software.
A website visitor will subscribe to read your RSS Feed. An RSS Feed will be read by an
RSS Feed reader.
The RSS Feed Reader reads the RSS Feed file and displays it. The RSS Reader displays
only new items from the RSS Feed.
The RSS Feed reader can be customized to show you content related to one or more RSS
Feeds and based on your own interest.
News Aggregators and Feed Readers:
RSS Feed readers and news aggregators are essentially the same thing; they are a piece of
software. Both are used for viewing RSS Feeds. News aggregators are designed specifically to
view news-related Feeds but technically, they can read any Feeds.
Who can Use RSS?:
RSS started out with the intent of distributing news-related headlines. The potential for RSS is
significantly larger and can be used anywhere in the world.
Consider using RSS for the following:
New Homes - Realtors can provide updated Feeds of new home listings on the market.
Job Openings - Placement firms and newspapers can provide a classified Feed of job
vacancies.
Auction Items - Auction vendors can provide Feeds containing items that have been
recently added to eBay or other auction sites.
Press Distribution - Listing of new releases.
Schools - Schools can relay homework assignments and quickly announce school
cancellations.
News & Announcements - Headlines, notices, and any list of announcements.
Entertainment - Listings of the latest TV programs or movies at local theatres.
RSS is growing in popularity. The reason is fairly simple. RSS is a free and easy way to promote
a site and its content without the need to advertise or create complicated content sharing
partnerships.