0% found this document useful (0 votes)
85 views

Presentation On Decision Tree

Uploaded by

riyazkhan7
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Presentation On Decision Tree

Uploaded by

riyazkhan7
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Presentation on Decision Tree

Submitted to: Submitted by:


Smita Agarwal Group-4

1
Introduction
•A decision tree is a classification scheme
which generates a tree and a set of rules.

•It represents the model of different classes,


from a given data set. The set of records
available for developing classification methods
is generally divided into two disjoints subsets.
(a) A training set
(b) A test set
2
•A decision tree construction process is
concerned with identifying the splitting
attributes and splitting criteria at every level of
the tree.

•The main of the decision tree construction


process is to generate simple, comprehensive
rules with high accuracy.

•Classification of tree can be improved by


revising the tree through process like pruning
and grafting. 3
Advantages of decision tree

• Decision trees are able to generate under stable rules.


• Decision tree are able to handle both numerical and the
categorical attributes.
• Decision tree provide a clear indication of which fields are
most important for prediction or classification.
• Perform well with large data in a short time.
• Robust means perform well the true model from which
the data were generated.
4
Products

1.Alice
2.Cart
3.Knowledge Seeker
4.See 5
5
Alice

• Product: Alice d’Isoft & Alice Server


• Vendor: Isoft
• Platforms: Metaframe, TSE, Windows, Unix

Details

6
Cart
• Product: Cart
• Vendor: Salford Systems
• Functions: Classification
• Platforms: CMS, MVS, Unix, Windows

Details

7
Knowledge Seeker
• Product: Knowledge Seeker
• Vendor: ANGOSS Software Corporation
• Functions: Classification
• Platforms: Windows, Unix

Details

8
See5
• Product: See5 / C5.0 1.15
• Vendor: RuleQuest Research Pty Ltd
• Functions: Classification

• See5/C5.0 is available for Windows


2000/Xp/Vista/7 and Linux.
• See5/C5.0 has been designed to analyze  substantial
databases 
• See5/C5.0 also takes advantage of processors with
quad cores, up to four CPUs, or Intel Hyper-Threading
to speed up the analysis.
Details 9
ALICE

•Alice d'ISoft, software for Data Mining by decision


tree, is a powerful and inviting tool that allows the
creation of segmentation models.

•It makes it possible for the business user to explore


data on line interactively and directly.

10
•As simple to use as a table, ALICE d'ISoft is made
for the business user and requires no previous
statistical skills.

•It allows, in complete autonomy, the most rapid of


analyses, and then the improvement of their
pertinence.

11
•Alice d'ISoft is the only software that takes business
knowledge into account in its analysis.

•Thanks to its rapidity and its facililty, it provides


segmentation models of the best quality.

12
Sectors of application

•Marketing: market studies, segmentation,


classification, customer profiles, satisfaction studies.
•Direct Marketing: surveys and opinion polls, return
criteria.
•Bank, financing, insurance: scoring, risk analysis,
fraud detection
•Industry: quality control, diagnostics, experience
return
•Health: clinical studies, biometrics,epidemology,
biomedical research
13
Analysis: Decision Tree
•Automatic construction

•Interactive construction

•Pruning

•Dynamic field impact on node representation

•One level development / reduction

•One node manual development with freely chosen field

•One node automatic development


14
•Cut, fold, unfold, isolate branch

•Symbolic values dividing / regrouping

•Hide nodes

•Thresholds-driven node coloration

•Multiple-trees projects

•Segmentation variables

•Tree navigator Go Back to Products


15
CART

•Salford Systems' flagship data mining software,


CART, is a robust, easy-to-use decision tree that
automatically sifts large, complex databases, searching
for and isolating significant patterns and relationships.
•This discovered knowledge is then used to generate
reliable, easy-to-grasp predictive models for
applications such as finding best prospects and
customers, targeted marketing, detecting credit card
fraud, and managing credit risk.
16
•Designed for both non-technical and technical
business users, CART can quickly reveal important
data relationships that could remain hidden using
other analytical tools.

•The most recent 2008 release, CART 6.0, includes


modeling automation technology that dramatically
accelerates the process of generating accurate and
robust models for deployment in core business
functions.

•CART was the primary tool used to win the


KDDCup 2000 web-mining competition and is
currently in use in major web applications. 17
•The CART creators continue to collaborate with
Salford Systems to enhance CART with proprietary
advances.

•With CART 6.0 ProEX, Salford has introduced


patented extensions to CART specifically designed to
enhance results for market research and web analytics.

•CART supports high-speed deployment, allowing


Salford models to predict and score in real time on a
massive scale.
18
CART's features provide
Stability and Reliability
CART uses an intuitive, Windows-based interface, making it
accessible to both technical and non-technical users.

Underlying the "easy" interface, however, is a mature theoretical


foundation that distinguishes CART from other methodologies and
other decision trees.

Salford Systems' CART is the only decision tree system based on the
original CART code developed by world-renowned Stanford
University and University of California at Berkeley statisticians; this
code now includes enhancements that were co-developed by Salford
Systems and CART's originators. 19
CART 6.0 ProEX Features
Tree Controls
•Force splitters into nodes
•Confine select splitters to specific regions of a
tree (Structured Tree™)

HotSpot Detector
•Search data for ultra-high performance segments.
•HotSpot Detector trees are specifically designed
to yield extraordinarily high-lift or high-risk nodes.
The process focuses on individual nodes and
generally discards the remainder of the tree.
20
Train/Test Consistency Assessment
Node-by-node summaries of agreement between train
and test data on both class assignment and rank
ordering of the nodes.
Quickly identify ideally-performing robust trees.

Modeling Automation
Automatically generate entire collections of trees
exploring different control parameters.
Nineteen automated batteries cover exploration of
multiple splitting rules, five alternative missing value
handling strategies, random selection of alternative
predictor lists, progressively smaller (or larger) training
sample sizes, and much more. 21
Predictor Refinement
Includes stepwise backwards predictor elimination
using any of three predictor ranking criteria (lowest
variable importance rank, lowest loss of area under the
ROC curve, highest variable importance rank).

Model Assessment via Monte Carlo Testing


Measure possible overfitting with automated Monte
Carlo randomization tests.
allows precise control over which predictors may be
combined into a single new feature.
22
Constructed Features
New tools for automatic construction of new features
(as linear combinations of predictors). Identification of
multiple lists of candidates

Unsupervised Learning Mode


Use Breiman's column scrambler to automatically
detect potential clusters with no need to scale data,
address missing values, or select variables for
clustering.

23
How are CART's decision
trees grown?
•CART uses strictly binary, or two-way, splits that divide each
parent node into exactly two child nodes by posing questions with
yes/no answers at each decision node.
•CART searches for questions that split nodes into relatively
homogenous child nodes, such as a group consisting largely of
responders, or high credit risks, or people who bought sport-utility
vehicles.
•As the tree evolves, the nodes become increasingly more
homogenous, identifying important segments.
•Other methods, such as CHAID, favor multi-way splits that can
paint visually appealing trees but that can bog models down with
less accurate splits.
24
Why is CART unique among
decision-tree tools?
CART is based on a decade of research, assuring stable performance and
reliable results. CART's proven methodology is characterized by:

Reliable pruning strategy - CART's developers determined definitively that


no stopping rule could be relied on to discover the optimal tree, so they
introduced the notion of over-growing trees and then pruning back; this idea,
fundamental to CART, ensures that important structure is not overlooked by
stopping too soon. Other decision-tree techniques use problematic stopping rules.

Powerful binary-split search approach - CART's binary decision trees are


more sparing with data and detect more structure before too little data are left for
learning. Other decision-tree approaches use multi-way splits that fragment the
data rapidly, making it difficult to detect rules that require broad ranges of data to
discover.
25
Automatic self-validation procedures - In the search for patterns in
databases it is essential to avoid the trap of "overfitting," or finding
patterns that apply only to the training data. CART's embedded test
disciplines ensure that the patterns found will hold up when applied to
new data. Further, the testing and selection of the optimal tree are an
integral part of the CART algorithm. Testing in other decision-tree
techniques is conducted after the fact and tree selection is left up to the
user.

In addition, CART accommodates many different types of real-


world modeling problems by providing a unique combination of
automated solutions:
1. surrogate splitters intelligently handle missing values
2. adjustable misclassification penalties help avoid the most costly errors
3.multiple-tree, committee-of-expert methods increase the precision of
results
4,alternative splitting criteria make progress when other criteria fail.
26
What tree-growing, or "splitting,"
criteria can CART provide?

CART includes seven single-variable splitting criteria - Gini, Symgini,


twoing, ordered twoing and class probability for classification trees, and
least squares and least absolute deviation for regression trees - and one
multi-variable splitting criteria, the linear combinations method. The
default Gini method typically performs best, but, given specific
circumstances, other methods can generate more accurate models. CART's
unique "twoing" procedure, for example, is tuned for classification
problems with many classes, such as modeling which of 170 products
would be chosen by a given consumer. Other splitting criteria are
available for inherently difficult problems in which even the best models
are expected to have a relatively low accuracy.
27
Demographics, for example, are often weak predictors of attitude- and
preference-based segments. Special CART tree-growing options can
dramatically increase the predictive accuracy of such demographic-
based models. Additional unique tree-growing criteria are available for
problems involving unequal misclassification costs, ordered target
variables, and continuous dependent variables.To deal more effectively
with select data patterns, CART also offers splits on linear combination
of continuous predictor variables. For this option, CART looks for
weighted averages of predictor variables to use as splitters; these
weighted averages can reveal important database structure and can
uncover new critical measures.
Go back to Products

28
Knowledge Seeker
• Knowledge SEEKER is widely used in
marketing, sales and risk functions.
• It is flexible, powerful, yet easy-to-use
interface enables users to quickly and
efficiently develop insights to advance their
goals and objectives.

29
• Data Discovery

– KnowledgeSEEKER features advanced data import,


sampling and preparation.

– Knowledge SEEKER can import data from virtually any


data source - statistical files, file servers and databases
through native drivers for SAS, SPSS, as well as using text
and ODBC

30
• Advanced Visualization

– Knowledge SEEKER has an extensive array of tools for


data exploration and visualization.

– Business-friendly graphs and charts offer rapid profiling


and can be exported into Microsoft Office applications.

– Users can quickly and conveniently perform advanced


visualization

31
• Decision Trees

– Knowledge SEEKER provides intuitive decision tree


capabilities that help users segment their
populations and understand the key drivers of
outcomes in their business data.

Go Back to Products

32
See 5
• See5/C5.0 is easy to use and does not
presume any special knowledge of Statistics.
• Xp/Vista/7 can install and use the 64-bit
version
• Linux C5.0 continues to be available in both
32-bit and 64-bit versions.

33
See5 (Windows 2000/Xp/Vista/7):
• Single-computer licence (32-bit)
• Single-computer licence (64-bit)
• Network licence (32-bit and 64-bit)

34
Enhanced multi-threading
• Additional sections of See5/C5.0 have been
multi-threaded. This can result in speed
improvements when the application has many
discrete attributes, especially with the discrete
value subset option.

35
Confidence of ruleset predictions
• This affects the output from the public code to
read and interpret See5/C5.0 ruleset
classifiers, and also impacts results with
boosted rulesets.

36
Small changes to pruning algorithms
• For most applications this should not affect
the final classifier; in some cases, the tree or
ruleset will be larger or smaller, but predictive
accuracy should be similar.

37
• Sample Applications
– Assessing Churn Risk
– Detecting Advertisements on the web
– Identifying Spam

Go back to Products

38
References

http://alice-soft.com/html/prod_alice.htm
http://salford-systems.com/cart.php
http://rulequest.com/see5-info.html
http://angoss.com/analytics_software/Knowledg
eSEEKER.php
Data mining techniques, arun K Pujari

39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy