Presentation On Decision Tree
Presentation On Decision Tree
1
Introduction
•A decision tree is a classification scheme
which generates a tree and a set of rules.
1.Alice
2.Cart
3.Knowledge Seeker
4.See 5
5
Alice
Details
6
Cart
• Product: Cart
• Vendor: Salford Systems
• Functions: Classification
• Platforms: CMS, MVS, Unix, Windows
Details
7
Knowledge Seeker
• Product: Knowledge Seeker
• Vendor: ANGOSS Software Corporation
• Functions: Classification
• Platforms: Windows, Unix
Details
8
See5
• Product: See5 / C5.0 1.15
• Vendor: RuleQuest Research Pty Ltd
• Functions: Classification
10
•As simple to use as a table, ALICE d'ISoft is made
for the business user and requires no previous
statistical skills.
11
•Alice d'ISoft is the only software that takes business
knowledge into account in its analysis.
12
Sectors of application
•Interactive construction
•Pruning
•Hide nodes
•Multiple-trees projects
•Segmentation variables
Salford Systems' CART is the only decision tree system based on the
original CART code developed by world-renowned Stanford
University and University of California at Berkeley statisticians; this
code now includes enhancements that were co-developed by Salford
Systems and CART's originators. 19
CART 6.0 ProEX Features
Tree Controls
•Force splitters into nodes
•Confine select splitters to specific regions of a
tree (Structured Tree™)
HotSpot Detector
•Search data for ultra-high performance segments.
•HotSpot Detector trees are specifically designed
to yield extraordinarily high-lift or high-risk nodes.
The process focuses on individual nodes and
generally discards the remainder of the tree.
20
Train/Test Consistency Assessment
Node-by-node summaries of agreement between train
and test data on both class assignment and rank
ordering of the nodes.
Quickly identify ideally-performing robust trees.
Modeling Automation
Automatically generate entire collections of trees
exploring different control parameters.
Nineteen automated batteries cover exploration of
multiple splitting rules, five alternative missing value
handling strategies, random selection of alternative
predictor lists, progressively smaller (or larger) training
sample sizes, and much more. 21
Predictor Refinement
Includes stepwise backwards predictor elimination
using any of three predictor ranking criteria (lowest
variable importance rank, lowest loss of area under the
ROC curve, highest variable importance rank).
23
How are CART's decision
trees grown?
•CART uses strictly binary, or two-way, splits that divide each
parent node into exactly two child nodes by posing questions with
yes/no answers at each decision node.
•CART searches for questions that split nodes into relatively
homogenous child nodes, such as a group consisting largely of
responders, or high credit risks, or people who bought sport-utility
vehicles.
•As the tree evolves, the nodes become increasingly more
homogenous, identifying important segments.
•Other methods, such as CHAID, favor multi-way splits that can
paint visually appealing trees but that can bog models down with
less accurate splits.
24
Why is CART unique among
decision-tree tools?
CART is based on a decade of research, assuring stable performance and
reliable results. CART's proven methodology is characterized by:
28
Knowledge Seeker
• Knowledge SEEKER is widely used in
marketing, sales and risk functions.
• It is flexible, powerful, yet easy-to-use
interface enables users to quickly and
efficiently develop insights to advance their
goals and objectives.
29
• Data Discovery
30
• Advanced Visualization
31
• Decision Trees
Go Back to Products
32
See 5
• See5/C5.0 is easy to use and does not
presume any special knowledge of Statistics.
• Xp/Vista/7 can install and use the 64-bit
version
• Linux C5.0 continues to be available in both
32-bit and 64-bit versions.
33
See5 (Windows 2000/Xp/Vista/7):
• Single-computer licence (32-bit)
• Single-computer licence (64-bit)
• Network licence (32-bit and 64-bit)
34
Enhanced multi-threading
• Additional sections of See5/C5.0 have been
multi-threaded. This can result in speed
improvements when the application has many
discrete attributes, especially with the discrete
value subset option.
35
Confidence of ruleset predictions
• This affects the output from the public code to
read and interpret See5/C5.0 ruleset
classifiers, and also impacts results with
boosted rulesets.
36
Small changes to pruning algorithms
• For most applications this should not affect
the final classifier; in some cases, the tree or
ruleset will be larger or smaller, but predictive
accuracy should be similar.
37
• Sample Applications
– Assessing Churn Risk
– Detecting Advertisements on the web
– Identifying Spam
Go back to Products
38
References
http://alice-soft.com/html/prod_alice.htm
http://salford-systems.com/cart.php
http://rulequest.com/see5-info.html
http://angoss.com/analytics_software/Knowledg
eSEEKER.php
Data mining techniques, arun K Pujari
39