Chapter - 5
Chapter - 5
Evolution of Web
Web 1.0
The first generation of Web, Web 1.0, was introduced by Sir Tim Berners-Lee in late 1990’s,
as a technology-based solution for businesses to broadcast their information to people.
It was defined as a system of interlinked, hypertext documents accessed via the Internet. It
was static and somewhat mono-directional with websites publishing the information for
anyone at any time.
It is one person or organization pushing content out to many people via websites and e-mail
newsletters as a one-way communication. The core concepts of Web 1.0 were HTTP, HTML
and URL. Eventually, it was imperative to have tools that facilitate quick and accurate access
to available Web information.
The Web 1.0 as an unlimited source of information with users from cross-section of society
seeking to find information to satisfy their information needs required an effective and
efficient mechanism to access it.
This read-only Web was accessible using an information retrieval system, popularly known
as web search engine.
Web 2.0
Web 2.0, also known as Read/ Write Web, first coined by Tim O´Reilly in 2004, helps the
typical user to contribute and “The user is the content” is its most popular slogan.
The popularity of Web 2.0 grows within all its applications. This new collaborative Web
(called Web 2.0) is extended by Web-based technologies like comments, blogs and wikis ,
hosts successful sites like Twitter or Facebook , that allow to build social networks based on
professional relationship, interests, etc.
Web 2.0 is a loose grouping of newer generation social technologies, whose users are actively
involved in communicating and collaborating with each other as they build connections and
communities across the web. Hence, it significantly increases the participating interest of
web users.
Web 2.0 not only connects individual users to the web, but also connects these individual
users together. It fixes the previous disconnection between web readers and writers. It is the
beginning of two-way communication in the online public commons.
The characteristics of Web 2.0 are rich experience, user participation, dynamic content and
scalability. Further characteristics such as openness, freedom and collective intelligence by
way of user participation make essential attributes of Web 2.0.
Web 2.0 Technologies
Web 2.0 encourages a wider range of expressive capability, facilitates more collaborative
ways of working, enables community creation, dialogue and knowledge sharing and creates
a setting for learners to attract authentic audiences by various tools and technologies.
Few of the most popular Web 2.0 tools are:
✓Blogging
✓Social Network Sites
✓Podcasts
✓Wikis
✓Micro-blogging
✓Social Bookmarking
✓E-Portfolios
Blogging
The World Wide Web makes it possible for you to publish your thoughts and distribute them
out to the entire world. Nowadays, there are several good, reliable blogging tools available
for free on the Web. You can set up your account and start blogging away within minutes.
✓Blog-roll: A blog-roll is a list of blogs and bloggers that any particular blog author finds
influential or interesting. Blog-rolls indicate which online community a blogger is attracted
to or belong to, and they are part of the conversations of the blogosphere.
Pod means a mobile playback device such as iPod or any other mp3 player and casting
derived from broadcasting.
The key difference between a podcast and a plain old audio file is the distribution model.
Most podcasts are shared (syndicated) using the RSS format - Real Simple Syndication.
Wikis
According to The Wiki Way, “Open editing has some profound and subtle effects on the
wiki’s usage. Allowing everyday users to create and edit any page in a Web site...encourages
democratic use of the Web and promotes content composition by nontechnical users.”
A Wiki is a website that which allows its users to actively collaborate and modify its content
and structure simply from the web browser. The collaborative encyclopedia “Wikipedia” is
the most popular example of a wiki today.
A defining characteristic of wiki technology is the ease with which pages can be created and
updated.
Vandalism of wikis is a common problem i.e. due to its open nature anyone with internet
and a computer can change wiki content to something offensive, adding nonsense or
deliberately adding incorrect information) can be a major problem.
Micro-blogging
Micro-blogging is the practice of posting small pieces of digital content—which could be
text, pictures, links, short videos, or other media—on the Internet.
Micro-blogging enable users to write brief messages, usually limited to less than 200
characters, and publish them via Web browser-based services, email, or mobile phones. The
most popular micro-blogging service today is called Twitter.
It creates a sense of online community where groups of friends and professional colleagues
connect to each other and frequently update content and follow each other’s posts.
This is one of the best examples of subscription services where the subscribers must
typically create accounts, which are linked with cell phones, e-mail accounts, instant
messaging, webpages—any medium they will use to send updates in order to post a micro-
blog or to read those posted by others.
The posting of micro-blogs has enjoyed a popular upsurge in last few years year, with add-
ons appearing regularly that enable more sophisticated updates and interaction with other
applications.
Other micro-blogging sites are friendfeed, tumblr, plurk, yammer, shout’em, google talks
etc.
Social Bookmarking
Social bookmarking is a way to store, organize, search, manage, and share collections of websites. In
a social bookmarking system, users save links to websites that they want to remember and/or share.
These bookmarks are usually public, but can be saved privately, or shared only with specified people
or groups.
Many social bookmarking services provide web feeds (RSS) for their lists of bookmarks and tagged
categories. This allows subscribers to become aware of new bookmarks as they are saved, shared,
and tagged by other users.
Several Popular social bookmarking sites include: Del.icio.us, Digg , Technorati etc.
E-Portfolios
An E-portfolio is a digitized collection of artifacts including demonstration, accomplishments
and that represents an individual, group or institution.
It is a collection of work developed across varied contexts over time. It may include input
text, electronic files, images, multimedia, blog entries and hyperlinks.
Types of E-Portfolios
✓Developmental: It shows the advancement of skill over a period of time.
✓Assessment: It demonstrates skill and competence in a particular domain or area.
✓Showcase: A showcase portfolio highlights stellar work in a specific area, it is typically
shown to potential employers to gain employment. When it is used for job application it is
sometimes called career portfolio.
Most e-portfolios are a mix of the three main types to create a hybrid portfolio. Today,
electronic portfolios are gaining popularity in schools and higher education.
Web 3.0
Currently most of the Web content is suitable for human use. Dynamic pages are generated
based on information from databases but without original information structure found in
databases.
The problems with current Web search results are high recall, low precision as the results are
highly sensitive to vocabulary. Moreover the results are single Web pages and most of the
publishing contents are not structured to allow logical reasoning and query answering.
The obvious shifts are from the era of ‘Web of Documents’ to the ‘Web of People’ to the
‘Web of Data’. The Web of Data is an upgrade of Web of Documents (also World Wide Web).
Semantic Web a.k.a Web 3.0 visions the contents to be machine process able.
“The Semantic Web is an extension of the current web in which information is given well-defined
meaning, better enabling computers and people to work in cooperation”.
The architecture of Semantic Web is described by Semantic Web Stack.
Semantic Web Stack
The core Semantic Web Technologies include four components:
✓Explicit metadata
✓Ontologies
✓Logic
✓Software agents
Explicit Metadata
Metadata captures part of the meaning of data. We annotate natural language web content explicitly
with semantic metadata. This semantic metadata encodes the meaning (semantics) of the content
which can then be read and interpreted correctly by machines.
XML
XML-based representations are more easily processable by machines since they are more structured.
DTDs or XML Schema specify the structure of documents, not the meaning of the document contents.
Thus, XML lacks a semantic model as the representation makes no commitment on Domain specific
ontological vocabulary and Ontological modeling primitives. Moreover the metadata in restricted
within documents, not across documents and is prescriptive, not descriptive. As a solution the next
step is Resource Description Framework (RDF).
Resource Description Framework (RDF)
A standard by W3C, RDF is a XML application defined using a document type definition (DTD) which
explicates relationships between documents.
RDF is made up of triples which are like simple grammatical sentences with a subject, a verb and an
object (in that order). The subject and the verb will be a URI, the object may be a URI or may be a
literal.
RDF is used to state facts and to exchange and represent knowledge.
Information expressed in RDF is represented as list of statements and each statement follows a
structure in the form of
<subject, predicate (property), object>
This structure format is known as triples in RDF terminology.
Resource Description Framework Schema (RDFS)
It is RDF vocabulary description language. RDFS allows definition of classes, properties, restrictions,
and hierarchies for further structuring of RDF resources. Hierarchies include sub-classes and super-
classes as well as sub-properties and super-properties. RDFS vocabulary is as follows:
Pronounced as ‘Sparkle’, it is a query language for RDF and RDFS databases which traverse the RDF
graph to fetch the output. It defines a protocol layer over HTTP knows as SPARQL protocol layer. The
output result of SPARQL query is of XML format.
To fetch data from RDF database, the SPARQL must have variable in query language. The variable must
be prefixed with question mark (?) to fetch particular information.
For Example:
?student_name, ?book_price, ?employee_salary
It follows the same SELECT command to fetch information from RDF database that is used in the
case of SQL. Therefore, select statement would be
Classes: They represent overall concepts which are described via attributes. Attributes are Name
Value pairs.
Relations: They are special attributes, whose values are objects of (other) classes
Instances: They describe individuals of an ontology
For Relations and Attributes Constraints (Rules) can be defined that determine allowed values
Classes, relations, and constraints can be put together to form statements / assertions
Special Case: formal Axioms, which, describe knowledge that can’t be expressed simply with the help
of other existing components
There are fundamentally four types of ontologies:
Top level Ontologies: They are general, cross domain ontologies. They are used to
represent very general concepts.
Application Ontologies: They are specialized ontologies focused on specialized task and
domain.
Logic and Inference
This component of semantic web defines unified language that express logical inferences
made using rules and information such as those specified by ontologies.
Logic can be used to specify facts as well as rules; New facts and derived from existing facts
based on the inference rules; Descriptive Logic is the type of logic that has been developed
for semantic web applications.
The Semantic Web Rule Language (SWRL) is a proposed language for the Semantic Web that
can be used to express rules as well as logic.
Software Agents
Software agents make use of all the above to help us in our task. They are defined as
a piece of software that that runs without direct human control or constant
supervision to accomplish goals provided by the user. Software Agents can
Big Data is used to refer to the collection of data sets that are too large and complex
to handle and process using traditional data processing applications. It usually
includes data sets with sizes beyond the ability of commonly used software tools to
capture, curate, manage, and process data within a tolerable elapsed time.
Big Data Characteristics
Volume: It determines the amount of data that flows in, which can be stored and originated
further. Depending on the amount of data that is stored, it is decided whether it can fall in
the category of “big data” or not.
Variety: Different type and various sources of data are specified including both structured
and unstructured data. For e.g. documents, emails, images, videos, audio etc.
Velocity: It is the aspect which deals with pace of the data in motion and its analysis where
the content flow is assumed to be continuous and immense.
Veracity: This characteristic of big data deals with the primary issue of reliability and whether
the analysis of data is being accurately done, so that eventually there is a production of
credible and quality solutions.
Variability: It refers to the inconsistency of the information that is stored. In simpler words, it
deals with rapidly changing and alternative meanings that are associated with the data.
Visualization: Visualization happens to make one of the crucial characteristics of big data
because all the data that is being stored used as an input and generated as a result, needs to
be sorted and viewed in a manner that is easy to read and comprehend.
Value: It deals with practice of retrieving the usefulness of the data. It is perceived that data
in its original self won't be valuable at all. Under analysis, how data is turned into knowledge
and information is what the “value” characteristic deals with.