Koneru Lakshmaiah College of Engineering: S.indira Priya Darsini V.chaitanya
Koneru Lakshmaiah College of Engineering: S.indira Priya Darsini V.chaitanya
Koneru Lakshmaiah College of Engineering: S.indira Priya Darsini V.chaitanya
FROM
There's the more difficult way to use the results of data mining: getting the user to actually understand
what is going on so that they can take action directly.
1) Visualization of the data mining output in a meaningful way, and
2) Allowing the user to interact with the visualization so that simple questions can be answered.
The phases depicted start with the raw data and finish with the extracted knowledge which was
acquired as a result of the following stages:
DATA WAREHOUSING
Data mining potential can be enhanced if the appropriate data has been collected and stored in a data
warehouse. Data warehousing is technique making it possible to extract archived operational data and
overcome inconsistencies between different legacy data formats.
The logical link between what the managers see in their decision support
EIS applications and the company's operational activities
The Data Warehouse architecture is a method by which the overall structure of data,
communication, processing and presentation for end-user computing with in the enterprise can be
represented. The architecture of the data warehouse is implemented layer-wise so as to achieve a
higher degree of abstraction. The layered architecture of a data warehouse is illustrated in the
following diagram.
APPLICATION
Information
Data Access
Data
Data Staging
Operational Data
DATA MARTS
middleware
Report
writers
extraction
DATA
EIS/DSS WAREHOUSE
Transformation
Data
Mining
Quality assurance
Change-based Replication
-------
Ware
House
Source
System ----
--
---
lk Bulk Extraction
-----
--- Ware
Source House
System -- -----
---
--
5
Data transformations
• Field splitting and consolidation:
• Standardization:.
• Data quality tools:
• Data access and retrieval tools
Data warehouse users derive and obtain information through these types of tools. Data access
and retrieval tools are currently classified into the subcategories below.
• Online analytical processing (OLAP) tools.
• Executive information systems(EIS)
1. One of the key issues raised by data mining technology is not a business or technological one,
but a social one. It is the issue of individual privacy.
2. Another issue is that of data integrity. Clearly, data analysis can only be as good as the data
that is being analyzed. A key implementation challenge is integrating conflicting or redundant
data from different sources.
3. A hotly debated technical issue is whether it is better to set up a relational database structure
or a multidimensional one.
4. Finally, there is the issue of cost. While system hardware costs have dropped dramatically
within the past five years, data mining and data warehousing tend to be self-reinforcing.
These provide a description of some of the most common data mining algorithms in use today. We
have broken the discussion into two sections, each with a specific theme:
Classical Techniques
Statistics
By strict definition "statistics" or statistical techniques are not data mining. They were being
used long before the term data mining was coined to apply to business applications.
Nearest Neighbor
Clustering and the Nearest Neighbor prediction technique are among the oldest techniques
used in data mining. Nearest neighbor is a prediction technique that is quite similar to clustering -
Clustering
Clustering is the method by which like records are grouped together. Usually this is done to
give the end user a high level view of what is going on in the database. Clustering is sometimes used
to mean segmentation -
This clustering information is then used by the end user to tag the customers in their
database. Once this is done the business user can get a quick high level view of what is happening
within the cluster. Once the business user has worked with these codes for some time they also begin
to build intuitions about how these different customers clusters will react to the marketing offers
particular to their business.
Decision Trees
A decision tree is a predictive model that, as its name implies, can be viewed as a tree.
Specifically each branch of the tree is a classification question and the leaves of the tree are partitions
of the dataset with their classification. For instance if we were going to classify customers who churn
(don’t renew their phone contracts) in the Cellular Telephone Industry a decision tree might look
something like that found in Figure
Figure A decision tree is a predictive model that makes a prediction on the basis of a series of
decision much like the game of 20 questions.
• It divides up the data on each branch point without losing any of the data (the number of total
records in a given parent node is equal to the sum of the records contained in its two
children).
• The number of churners and non-churners is conserved as you move up or down the tree
Neural Networks
Foremost among the advantages of neural networks is their highly accurate predictive models
that can be applied across a large number of different types of problems. True neural networks are
biological systems that detect patterns, make predictions and learn.
The link - This loosely corresponds to the connections between neurons (axons, dendrites and
synapses) in the human brain.
Some of the criteria that are important in determining the technique to be used are determined by trial
and error. There are definite differences in the types of problems that are most conducive to each
technique but the reality of real world data and the dynamic way in which markets, customers and
hence the data that represents them is formed means that the data is constantly changing. These
dynamics mean that it no longer makes sense to build the "perfect" model on the historical data since
whatever was known in the past cannot adequately predict the future because the future is so unlike
what has gone before.
POTENTIAL APPLICATIONS
Data mining has many and varied fields of application some of which are listed below.
Retail/Marketing
Banking
Transportation
Medicine
CONCLUSION
Data Mining is not a new phenomenon. All large organizations already have data warehouses,
but they are just not managing them. The Data Warehousing solution should enhance
intelligence in decision-making process of an enterpris. Over the next few years, the growth of
data mining is going to be enormous with new products and technologies coming out frequently. In
order to get the most out of this period, it is going to be important that data warehousing and mining
planners and developers have a clear idea of what they are looking for and then choose strategies and
methods that will provide them with performance today and flexibility for tomorrow.