How Evolution of Database Led To Data Mining
How Evolution of Database Led To Data Mining
How Evolution of Database Led To Data Mining
PART-A
Q-1 Explain how the evolution of database led to data mining.
Q-2 Give a brief architecture of typical data mining system.
Q-3 “Data Mining is the extraction of knowledge” comment.
PART-B
Q-4 Compare the Operational database with Data Warehouse.
Q-5 Justify the terms Subject oriented, Non Volatile, Time variant in Data
Warehouse.
Give brief about OLAP & OLTP. Try to differentiate between them.
Q-6 Give some ideas about latest research area in Data Mining.
PART-A
Data mining results are the patterns or various combinations in form of information. Only that
information is of ultimate use. Cleaning, Integration etc are excluded from Data mining concept.
Data mining applications can also refer Knowledge base to support advanced decision making.
Database: Layer
Front End:
Front End is the user interface layer. It has following prime functionalities.
Administration
Input Parameter Settings
Data Mining Results / Visualization
Data mining refers to extracting or mining knowledge from large amount of data. Most
companies already collect and refine massive quantities of data. When implemented on high
performance client/server or parallel processing computers, data mining tools can analyze
massive databases to deliver answers to questions such as, "Which clients are most likely to
respond to my next promotional mailing, and why?"
Examples:
Risk Analysis
Given a set of current customers and an assessment of their risk-worthiness, develop descriptions
for various classes. Use these descriptions to classify a new customer into one of the risk
categories.
Targeted Marketing
Given a database of potential customers and how they have responded to a solicitation, develop a
model of customers most likely to respond positively, and use the model for more focused new
customer solicitation. Other applications are to identify buying patterns from customers; to find
associations among customer demographic characteristics, and to predict the response to mailing
campaigns.
Retail/Marketing
Given a particular financial 'asset, predict the return on investment to determine the inclusion of
the asset in a folio or not.
Brand Loyalty
Given a customer and the product he/she uses, predict whether the customer will switch brands.
Banking
PART B
4. Features of Data Warehouse:
W. H. Inmon author of building the data warehouse and the guru, characterized a data
warehouse as "a subject-oriented, integrated, nonvolatile, time-variant collection of data
in support of management's decisions." Data warehouses provide access to data for
complex analysis, knowledge discovery, and decision-making.
1. Subject oriented:
Data are organized according to subject instead of application e.g. an insurance
company using a data warehouse would organize their data by costumer, premium, and
claim, instead of by different products (auto. Life etc.).
• Organized around major subjects, such as customer, product, sales.
• Focusing on the modeling and analysis of data for decision making, not on daily
operations or transaction processing.
• Provide a simple and concise view around particular subject by excluding data that
are not useful in the decision support process.
1. Integrated:
When data resides in money separate applications in the operational environment,
encoding of data is often inconsistent. For instance in one application, gender might be
coded as “m” and “f” in another by o and l. When data are moved from the operational
environment in to the data warehouse, when data are moved from the operational
environment in to the data warehouse, they assume a consistent coding convention e.g.
gender data is transformed to “m” and “f”.
• Constructed by integrating multiple, heterogeneous data sources as relational
databases, flat files, on-line transaction records.
• Providing data cleaning and data integration techniques
1. Time variant:
• The data warehouse contains a place for storing data that are five to ten years old, or
older, to be used for comparisons, trends, and forecasting. These data are not up
dated.
• The time horizon for the data warehouse is significantly longer than that of
operational systems.
• Every key structure in the data warehouse contains an element of time (explicitly or
implicitly)
1. Non-volatile:
Data are not update or changed in any way once they enter the data warehouse, but
are only loaded and accessed.
• A physically separate store of data transformed from the operational environment.
• Does not require transaction processing, recovery, and concurrency control
mechanisms.
Data Warehouse
The Data Warehouse is an evolving set of university data used for reporting, planning, and
decision making.The data may contain information extracted from the Operational Data Store,
campus operational systems, and external data sources. The Data Warehouse incorporates web-
based access through the eReports portal in addition to providing direct access through desktop
tools like Hyperion, Microsoft Access, and Filemaker Pro.
Operational Data Store
The OIT Operational Data Store (ODS) is a set of relational databases that contain data extracted
on a nightly basis from operational systems on campus relating to students, personnel, financial
aid, admissions, and the Billing and Accounts Receivable System (BARS). The ODS allows you
to query operational data using desktop tools like Microsoft Access and Filemaker Pro.
OLAP:
A white paper entitled ‘Providing OLAP (On-line Analytical Processing) to User-Analysts: An
IT Mandate’, E.F. Codd established 12 rules to define an OLAP system. In the same paper he
listed three characteristics of an OLAP system. Dr. Codd later added 6 additional features of an
OLAP system to his original twelve rules.
Three significant characteristics of an OLAP system
• Dynamic Data Analysis
This refers to time series analysis of data as opposed to static data analysis, which does not allow
for manipulation across time. In an OLAP system historical data must be able to be manipulated
over multiple data dimensions. This allows the analysts to identify trends in the business.
• Four Enterprise Data Models
The Categorical data model describes what has gone on before by comparing historical values
stored in the relational database. The Exegetical data model reflects what has previously
occurred to bring about the state, which the categorical model reflects. The Contemplative data
model supports exploration of ‘what-if’ scenarios. The Formulaic data model indicates which
values or behaviour across multiple dimensions must be introduced into the model to affect a
specific outcome.
OLAP database servers support common analytical operations including: consolidation,
drill-down, and "slicing and dicing".
• Consolidation - involves the aggregation of data such as simple roll-ups or complex
expressions involving inter-related data. For example, sales offices can be rolled-up to
districts and districts rolled-up to regions.
• Drill-Down - OLAP data servers can also go in the reverse direction and automatically
display detail data, which comprises consolidated data. This is called drill-downs.
Consolidation and drill-down are an inherent property of OLAP servers.
• "Slicing and Dicing" - Slicing and dicing refers to the ability to look at the database from
different viewpoints. One slice of the sales database might show all sales of product type
within regions. Another slice might show all sales by sales channel within each product
type. Slicing and dicing is often performed along a time axis in order to analyse trends
and find patterns.
OLTP:
A data base which in built for on line transaction processing, OLTP, is generally
regarded as inappropriate for warehousing as they have been designed with a different set
of need in mind i.e., maximizing transaction capacity and typically having hundreds of
table in order not to look out user etc. Data warehouse are interested in query processing
as opposed to transaction processing.
OLTP systems cannot be receptacle stored of repositories of facts and historical data for
business analysis. They cannot be quickly answer adhoc queries is rapid retrieval is
almost impossible. The data is inconsistent and changing, duplicate entries exist, entries
can be missing and there is an absence of historical data, which is necessary to analyses
trends. Basically OLTP offers large amounts of raw data, which is not easily understood.
The data warehouse offers the potential to retrieve and analysis information quickly and
easily. OLAP V/S OLTP is discussed in the following table:
Features OLTP OLAP
Characteristics Transactional Informational
Purpose Operations of business Analysis of businesses
Users Expert Knowledge Workers
Types of Short & Daily Complex queries for
transactions transactions Decision Making
Type of data Updated Historical
Memory Less as MB’s can also Depends upon data,
Requirements help generally more
Focus Data Information
Data Updates Up to date and Historic and
detailed, Changing in Summarised data
complete
Output Metric Detailed view based, Summary based,
fast flexible enough to cope
up with the changes.
Access Patterns Concurrent controls to Atomic operations
be maintained` need to be supported
Orientation Customer Oriented Market oriented
Data Model Normalised in RDBMS Multi-dimensional base
is also RDBMS
Access SQL SQL plus data analysis
extension
4 Wavelet applications
Data Mining – Parallel Object-Oriented, De-
noising System Using Wavelet Multi-
resolution Analysis
5 Data Mining – Creating ensembles of oblique Artificial Intelligence,
decision trees with evolutionary algorithms research in any area like
and sampling sports selection
6 Data Mining – A tools for chemo informatics Specific drug prescription
for drug identification optimization