Data Warehouse & Data Mining
Data Warehouse & Data Mining
Data Warehouse & Data Mining
&
DATA MINING
PREPARED FOR
AT
A modern reporting environment will give users access to their data, but it
doesn’t solve all their problems. Just because users have access to data doesn’t
guarantee the integrity of that data. Nor does it guarantee that system response
times will be adequate. It doesn’t guarantee that the users system won’t purge old
data before its useful life is passed. Data warehousing and data mining can
address the above problems and provide a technology that enables the decision-
maker to process this huge amount of data in a reasonable amount of time and to
as companies want to “mine” that data for other purposes (particularly for sales,
marketing etc.,). So data mining techniques are employed for extracting new
Our paper focuses upon the overview of the data warehouse and data
mining. We also specify that use of the data mining in various fields.
INTRODUCTION:
The age of industrial revolution has finally been completed and the world has
entered the age of information technology. The need for data warehouse applications is
one of the manifestations of this information technology age. It has becoming more of
necessity that an accessory for a progressive, competitive, and focused organization. It
provides the right foundation for building decision support and executive information
system tools that are often built to measure and provide a feel for how well an
organization is progressing toward its goal.
A data warehouse supports business analysis and decision-making by creating an
enterprise-wide integrated database of summarized, historical information. It integrates
data from multiple, incompatible sources. By transforming data into meaningful
information, and a data warehouse allows the manager to perform more substantive,
accurate and consistent analysis.
The data warehouse is not the normal database, as we understand the term
“database”. The main difference is that the traditional databases hold operational-type
most often, transactional type data and that many of the decision-support type
applications put too much strain on the databases intervening into the day-to-day
operation (operational database). A data warehouse is of course a database, but it
contains summarized information.
Data warehouse refers to database that is maintained separately from an
organizations operational databases. A warehouse holds read-only-data.
DEFINITION:
A data warehouse is subject-oriented, integrated, time varying, non-volatile
collection of data in support of the management’s decision-making process. The data
stored in the warehouse are not just a copy of the data at the sources. Instead, they can
be thought of as a stored view or materialized view of the data at the sources.
The most basic component in a data warehouse is a relational database.
Relational databases are designed to be able to efficiently insert new data and locate
existing data using a standardized query language. Underneath the database is a maze of
connections and transformations connecting the data warehouse with other systems.
Because data in a company is often created and stored in functionally specific systems
(e. g: payroll system), the data may need to be replicated and moved between a data
warehouse and these other systems.
Functions of data warehouse
The main function behind a data warehouse is to get the enterprise-wide data in a
format that is most useful to end-users, regardless of their locations. Data warehousing
is used for:
Increasing the speed and flexibility of analysis.
Providing a foundation for enterprise-wide integration and access.
Improving or re-inventing business processes.
Gaining a clear understanding of customer behavior.
ARCHITECTURE:
The design of the data architecture involves understanding all of the data and how
different pieces are related.
Creating and maintaining a data warehouses is a huge job even for the largest
companies. It can take a long time and cost a lot of money. In fact, it is such a major
project companies are turning to data mart solutions instead.
A Data mart is an index and extraction system. Rather than bring all the data
into a single warehouse, the data mart knows what data each database contains and how
to extract information from multiple databases when asked. Creating a data mart can be
considered the “quick & dirty” solution, because the data from different databases is not
scrubbed and reconciled, but it may be the difference between having information
available and not having it available.
Once the data is in the data mart, it may be possible to avoid retaining the data
within the data warehouse, meaning that the data warehouse is used only as a transient
storage area.
Companies that have been in business for a while realize they have accumulated
huge amount of data in various operational databases. Those databases work just fine for
their intended purposes, but most of the companies want to mine data for other purposes.
DATA MINING:
Data mining is the process of extracting information from the company’s various
databases and re-organizing it for purposes other than what the database where originally
intended for.
Data mining, the extraction of hidden predictive information from large
databases, is a powerful new technology with great potential to help companies focus on
the most important information in their data warehouses. Data mining tools predict
future trends and behaviors, allowing businesses to make proactive, knowledge-driven
decisions. Data mining tools can answer business questions that traditionally were too
time consuming to resolve. They scour databases for hidden patterns, finding predictive
information that experts may miss because it lies outside their expectations.
Most companies already collect and refine massive quantities of data. Data
mining techniques can be implemented rapidly on existing software and hardware
platforms to enhance the value of existing information resources, and can be integrated
with new products and systems as they are brought on-line.
The foundations of data mining
The evolution of data mining began when business data was first stored on
computers, continued with improvements in data access, and more recently, generated
technologies that allow users to navigate through their data in real time. Data mining is
ready for application in the business community because it is supported by three
technologies that are now sufficiently mature:
Massive data collection.
Powerful multiprocessor computers.
Data mining algorithms.
The four steps listed in the table given below were revolutionary because they allowed
new business questions to be answered accurately and quickly.
Evolutionary StepBusiness QuestionEnabling TechnologiesProduct
ProvidersCharacteristicsData
Collection
Decision
Support
(1990s)“What were unit sales in New England last March? Drill down to Boston.”On-
line analytic processing
(OLAP),
multidimensional databases, data
warehousesPilot, Co share Arbor, Cognos,
Micro
StrategyRetrospective, dynamic data delivery at multiple levels.Data Mining
1. Fraud detection
All too often businesses are so caught up in their daily operations that they don’t
have time to dedicate to uncovering to those out of ordinary business. These events
include fraud, employee theft, and illegal redirection of company goods. Fraud detection
is seen primarily as out-of-the-blue data mining.
2. Return on investments
A significant segment of the companies looking at, or already adopting, data
warehouse technology spend millions of dollars on new business initiatives. The
research & development costs are astronomical. Everyone has struggled with time.
These returns on investment give a finite amount of money and people available. This is
a form of targeted data mining.
3. Scalability of electronic solution
The major player in the data-mining arena provides solutions that are robust and
scalable. A robust data mining solution is one that performs well and can display results
in an acceptable time. The ability to work with a wide range fo input datasets is part of
this phenomenon called scalability.
A credit card company can leverage its vast warehouse of customer transaction
data to identify customers most likely to be interested in a new credit product.
Using a small test mailing, the attributes of customers with an affinity for the
product can be identified.
A diversified transportation company with a large direct sales force can apply
data mining to identify the best prospects for its services. Using data mining to
analyze its own customer experience, this company can build a unique
segmentation identifying the attributes of high-value prospects.
A large consumer package goods company can apply data mining to improve its
sales process to retailers.
CONCLUSIONS
Comprehensive data warehouses that integrate operational data with customer,
supplier, and market information have resulted in an explosion of information.
Competition requires timely and sophisticated analysis on an integrated view of the data.
A new technological leap is needed to structure and prioritize information for specific
end-user problems. The data mining tools can make this leap. Data warehouse and data
mining plays an important role in storing data and sorting out the particular data. It has
become very easy for a user to get the information that he wants through this mining.
Quantifiable business benefits have been prove through the integration of data mining
with current information systems, and new products are on the horizon that will bring
this integration to an even wider audience of users.
REFERENCES:
1. Oracle8i warehousing by Michael Corey.
2. Data warehousing and data mining by Kurt Thearling.
3. Database management by Silberschtz, Korth.
4. www.pcc.ac.uk.com
5. www.datawarehousingonline.com
6. Data mining by Arun. K. Pujari.
7. Data warehousing by Sunitha S, IIT Bombay