Visualizing Trees and Forests
Visualizing Trees and Forests
net/publication/227249230
CITATIONS READS
13 1,579
4 authors:
All content following this page was uploaded by Wolfgang Karl Karl Härdle on 09 February 2016.
Simon Urbanek1
1 Introduction
2 Individual Trees
The basic principle of all tree-based methods is a recursive partitioning of the
covariates space to separate subgroups that constitute a basis for prediction.
This means that starting with the full data set at each step a rule is consulted
that specifies how the data are split into disjoint partitions. This process is
repeated recursively until there is no rule defined for further partitioning.
Commonly used classification and regression trees use univariate decision
rules in each partitioning step, that is the rule specifying which cases fall into
which partition evaluates only one data variable at a time. For continuous
variables the rule usually creates two partitions satisfying the equations xi < s
and xi ≥ s respectively, where s is a constant. Partitions induced by rules using
categorical variables are based on the categories assigned to each partition.
We refer to a partitioning step often as split and speak of the value s as the
cut point.
The recursive partitioning process can be described by a tree. The root
node correspond to the first split and its children to subsequent splits in
the resulting partitions. The tree is built recursively in the same way as the
partitioning and terminal nodess (also called leaves) represent final partitions.
Therefore each inner node corresponds to a partitioning rule and each terminal
node to a final partition.
Each final partition has been assigned a prediction value or model. For
classification trees the value is the predicted class, for regression trees it is
the predicted constant, but more complex tree models exist such as those
featuring linear models in terminal nodes. In the following we will mostly use
classification trees with binary splits for illustration purposes, but all methods
can be generalized for more complex tree models unless specifically stated
otherwise. We call a tree consisting of rules in inner nodes regardless of the
type of prediction in the leaves a decision tree.
Probably the most natural way to visualize a tree model is to display its
hierarchical structure. Let us describe more precisely what it is we want to
visualize. In order to describe the topology of a tree we want to borrow some
terminology for the graph theory. A graph is is a set of nodes (sometimes called
vertices) and edges. There a tree is defined as a connected, acyclic graph.
Topologically, decision trees are a special subset of those, namely connected
directed acyclic graphs (DAGs) with exactly one node of indegree 0 (the root
- it has no parent) and outdegrees other than 1 (i.e at least two children or
none at all).
In order to fully describe a decision tree, additional information is associ-
ated with each node. For inner nodes this information represents the splitting
rule, for terminal nodes it consists of the prediction. Plots of tree models at-
tempt to make such information visible in addition to displaying the graph
Visualizing Trees and Forests 3
aspect of the model. Three different ways to visualize the same classification
tree model are shown in Fig. 1.
eicosenoic>=6.5 eicosenoic
|
Apulia Sicily
North Sardinia
≥951
Apulia
linoleic ≥95.5
≥6.5 palmitoleic Calabria
eicosenoic <951 stearic <264.5
<95.5 Apulia
≥264.5
Sicily
linoleic <1053.5
<6.5 North
≥1053.5
Sardinia
/Volumes/Caladan/Users/urbanek/Datasets/Olives
The tree model is based on the Italian olive oil dataset (?), which records
the composition of Italian olive oils from different regions of Italy. Each co-
variate corresponds to the proportion (in 1/10000th) of a fatty acid (in the
order of concentration): oleic, palmitic, linoleic, stearic, palmitoleic, arachidic,
linolenic and eicosenoic acid. The response variable is categorical and spec-
ifies the region of origin. The goal is to determine how the composition of
olive oils varies across regions of Italy. For illustration purposes we perform a
classification using five regions: Sicily, Calabria, Sardinia, Apulia and North
(the latter consolidating regions north of Apulia).
Although the underlying model is the same for all plots in Fig. 1, the visual
representation is different in each plot. Visualization of a tree model based on
its hierarchical structure has to contemplate the following tasks:
• placement of nodes
• visual representation of nodes
• visual representation of edges
• annotation
4 Simon Urbanek
eicosenoic
Apulia Sicily
⇓ 4×
eicosenoic
Apulia Sicily
Fig. 2. Censored zoom of nodes - bottom plot is a censored zoom (4×) of the top
plot. Nodes that would appear too large are censored at a maximum allowed size
and flagged by a red line.
Visualizing Trees and Forests 5
The top plot shows node representation without zoom, that is the size
of the root node corresponds to all data. All subsequent splits partition this
data and hence the node area until terminal nodes are reached. If plotted truly
proportionally, last two leaves split by the stearic variable would be hardly
visible. Therefore a minimal size of a node is enforced and the fact that this
representation is not truly proportional is denoted by a red border.
In order to provide truly proportional comparison of small nodes, we can
enlarge all nodes by a given factor. In the bottom plot a factor of four was
used. Now those small nodes can be distinguished along with the class pro-
portions, but large nodes would need to be four times as big as in the first
plot, obscuring large portions of the plot and possibly other nodes. Therefore
we also enforce a maximal size of a node. Again, to denote nodes that are not
shown proportionally due to upper censoring, we use a red line along the top
edge of the node.
The placement of nodes is a task that has been discussed intensely in the
graph visualization community. For small trees, simple approaches, such as a
bottom-up space partitioning, work well. As the trees grow larger, node layout
becomes more challenging. For tree model visualization, however, associated
information is in most cases more important than differences in local topol-
ogy, especially where the structure is imposed by the tree growing algorithm.
Therefore interactive approaches, allowing the user to explore the tree model
by local magnification while retaining global context, are recommended for
large tree models.
In the above examples, basic lateral placement is performed by an equidis-
tant partition of the available space. Only the first plot uses non-equidistant
placement of nodes in the direction of tree depth, namely the distance of two
nodes in this direction is proportional to the impurity decrease and thus in a
sense to the ‘quality’ of the split. The third plot uses special placement in that
it is rotated by 90 degree counter-clockwise relative to the usual representa-
tion and all terminal nodes are aligned in order to facilitate easy comparison
of the class proportions.
The visual representation of edges is usually restricted to drawing direct
or orthogonal lines. Nevertheless, more elaborate representation of edges, such
as polygons whose width is proportional to the number of cases following that
particular path is another possibility, creating a visual representation of the
‘flow’ of data through the tree.
Annotations are textual or symbolic representations displayed along the
nodes or edges. In Fig. 1 annotations describe predictions and splitting rules.
Although annotations can be useful, they should be used with caution, because
they can easily clutter the plot and thus distract from the key points to be
conveyed.
Overloading plots with information can offset the benefits of the plot, in
particular its ability to provide information at a glance. When the represen-
tation of a node is too large, such as including list of statistics or additional
plots, it will consume so much space that it is only possible to display very
6 Simon Urbanek
few levels of the tree on a screen. The same applies to a printed version, be-
cause the size of a sheet of paper is still limited. Therefore additional tools
are necessary to keep track of the overall structure in order not to get lost.
Most of these tools, such as zoom, pan, overview window or toggling of differ-
ent labels are available in interactive context only. Especially for an analysis,
a visualization of additional information is required. There are basically two
possibilities of providing such information:
• Integrate the information in the tree visualization.
• Use external linked graphics.
Direct integration is limited by the spatial constraints posed by the fixed
dimension of a computer screen or other output medium. Its advantage is the
immediate impact on the viewer and therefore easier usage. It is recommended
to use this kind of visualization for properties that are directly tied to the tree.
It makes less sense to display a histogram of the underlying dataset directly in
a node because it displays derived information which can be more comfortably
displayed outside the tree, virtually linked to a specific node. It is more sensible
to add information directly related to the tree structure, such as the criterion
used for the growth of the tree.
External linked graphics are more flexible, because they are not displayed
directly in the tree structure for each node, but are only logically linked to a
specific node. Spatial constraints are less of a problem because one graphic is
displayed instead of many for each node. The disadvantage of linked graphics
is that they must be interpreted more carefully. The viewer has to bear in mind
the logical link used to construct the graphics as it is not visually attached to
its source (node in our case).
There is no fixed rule as of what kind of information should be displayed
inside or outside the tree structure. A rule of thumb says that any more
complex graphic should use the external linked approach, whereas less complex
information directly connected with the tree structure should be displayed in
the tree visualization.
Sectioned scatterplots
Splitting rules are formulated in the covariates space therefore a way to visu-
alize a tree model is to visualize this space along with the induced partition
Visualizing Trees and Forests 7
eicosenoic
1400
Apulia
Calabria
North ≥6.5 < 6.5
Sardinia
Sicily linoleic linoleic
1200
palmitoleic
< 95.5
stearic
600
Calabria
< 264.5
0 10 20 30 40 50 60 Apulia Sicily
eicosenoic
Fig. 3. Sectioned scatterplot (left) showing the root splits and splits in its children
of a classification tree (right).
350
300
stearic
250
200
palmitoleic
Treemaps
One way of displaying all partitions is to use area-based plot where each termi-
nal node is represented by a rectangle. Treemaps belong to this plot category.
The main idea is to partition available rectangular plot space recursively in
the same way that the tree model partitions data. Therefore treemaps are
data-driven representations of the model.
The rectangular area of the treemap corresponds to the full dataset. In
the first step this area is partitioned horizontally according to the propor-
tions of cases passed to each child node. In the next step each such partition
is partitioned vertically corresponding to case proportions in its children. This
process is repeated recursively with alternating horizontal and vertical parti-
tioning directions as illustrated in Fig. 5 until terminal nodes are reached.
→ →
Step 1 Step 2 Step 3
subgroups, while leaving a large chunk of cases in one node that is hard to
separate. Such behavior is easily detected in treemaps as large terminal nodes.
Moreover, treemaps are suitable for highlighting or brushing, allowing the
comparison of groups within terminal nodes. Treemap of the model from Fig. 1
with colors stacked by group is shown in Fig. 6.
Fig. 6. Treemap with stacked bars representing response classes. Color coding and
data are the same as in Fig. 3.
It is clearly visible that the tree model is able to split off large homoge-
nous groups successfully, but more subsequent splitting is necessary for nodes
visible in the upper-left part of the plot.
Treemaps described here are an extension of those used in computer sci-
ence information visualization of hierarchically stored contents. They are also
related to mosaic plots. More precisely a mosaic plot is a treemap of a de-
composition tree, that is a tree whose splits of the same depth use the same
categorical splitting variable and have as many children as there are categories
in the data.
The main advantage of treemaps is very efficient use of display space. They
allow absolute comparison of nodes and subgroup sizes while maintaining
context of the tree model. They scale well with both increasing data set size
and tree model complexity. What they cannot show is information about
splitting criteria and they do not allow direct relative comparison of groups
within nodes. An alternative visualization technique exists for the latter task.
Visualizing Trees and Forests 11
Spineplots of leaves
Another useful plot for tree model visualization is the spineplot of leaves
(SPOL). By not alternating the partitioning direction as in treemaps, but
constantly using horizontal partitioning, we obtain a plot showing all termi-
nal nodes in one row.
Due to the fixed height, it is possible to visually compare sizes of the
terminal nodes which are proportional to the width of the corresponding bar.
Moreover, relative proportions of groups are easily comparable when using
highlighting or brushing.
A sample spineplot of leaves is shown in Fig. 7. The displayed data and
model is the same as in Fig. 6, as well as the color brushing. Each bar corre-
sponds to a leaf and the width of each bar is proportional to number of cases
in that particular node.
methods focus on the visualization of splits, their sequence and the application
of the model to data. One important property of all visualization techniques
presented is their applicability to arbitrary subsets of the data. Although most
illustrations used training data and the corresponding fitted tree model, it is
also feasible to visualize test data instead. Where a view of the training data
highlights the adaptability of the model, the view of test data focuses on sta-
bility and overfitting. Moreover it is possible to compare both views side by
side.
This leads us to further important aspects of a tree model which are cred-
ibility and quality of the splits and the entire model. In the next section
we want to briefly discuss tree model construction and present visualization
methods that incorporate information about split quality into both existing
and new plots.
So far we have discussed methods for visualizing tree models on their own and
including data the models are applied to. There is, however, more information
associated with each node that waits to be visualized. In order to understand
tree models better, we need to know more about the process of fitting tree
models.
Although a tree model is straight-forward to interpret and apply, its con-
struction is not trivial. In theory, we would like to consider all possible tree
models and pick the one that fits the given data best, based on some loss
function. Unfortunately this proves to be unfeasible save for trivial examples,
because the computational cost increases exponentially with tree size.
Therefore several other approaches were suggested for fitting tree models.
The most commonly used algorithm CART (Classification and Regression
Trees) was introduced by ?). It performs a greedy local optimization as follows:
for a given node, consider all possible splits and chose the one which reduces
the relative impurity of the child nodes most relative to the parent node.
This decrease of impurity (and hence increase of purity) is assessed using an
impurity criterion. Such locally optimal split is then used and the search is
performed recursively in each child node.
The growing is stopped if one of the stopping rules is met. The most
common stopping rules are minimal number of cases in a node and minimal
requirement on the impurity decrease. In practice it is common to relax the
stopping rule and use pruning methods, however, discussion of pruning meth-
ods is beyond the scope of this chapter. Nevertheless, visualization can be
useful for pruning, especially in an interactive context where pruning param-
eters can be changed on the fly and reflected in various displays.
Measures of impurity can be any arbitrary convex functions, but the com-
monly used measures are entropy and Gini index which have theoretical foun-
dations (c.f. ?). It is important to note that this search looks for a local
optimum only. It has no way for ‘looking ahead’ and considering multiple
Visualizing Trees and Forests 13
Mountain plots
The basic idea of a mountain plot (?) is to visualize the decrease of impurity
over the entire range of the split variable. This is illustrated on a binary
classification problem in Fig. 8. In this particular example a binary response
denotes whether a patient was able to recover from a diagnosed meningitis
disease, whereas the predictive variable Age refers to patient’s age at the time
of the diagnosis.
Recover
no
yes
Age
20 30 40 50 60 70 80 90
I
10
Age
0
20 30 40 50 60 70 80 90
Fig. 8. Stacked dotplot side-by-side of the split variable (Age) and the target vari-
able (Recover ) along with the corresponding mountain plot showing the impurity
decrease for each cut point. Optimal cut point is denoted as a solid red line, runner-
up splits as dotted red lines.
14 Simon Urbanek
The top part of the figure shows a stacked dotplot of the split variable
grouped by the binary response. The bottom part of the plot shows a mountain
plot. The value of the empirical impurity measure is constant between data
points and can change only at values taken by the data. The value of the
impurity decrease is by definition zero outside the data range.
In the presented example it is clearly visible that there are three alternative
splits that come very close to the ‘optimal’ cut point chosen by the greedy
algorithm.
50 50
40 40
30 30
20 20
10 10
4 5 6 7 8 9 0 10 20 30
104 104
0 0
4 5 6 7 8 9 0 10 20 30
Rooms LowStat
Fig. 9. Two mountain plots of the variables Rooms and LowStat and the corre-
sponding scatterplots vs the response variable. The optimal splits are denoted by
red lines, means for each partition are represented by gray lines in the scatterplots.
The competition for the best split is not limited to a single variable. Fig. 9
illustrates a competition among two different variables in a regression tree.
The models are based on the Boston housing dataset by ?).
Although both splits have almost identical impurity decrease maxima, the
data show different patterns. The relationship seen in the left part of the plot
is probably better modeled by a linear model, whereas on the right hand side
we see a change of behavior around the chosen cut point.
By plotting mountain plots of candidate variables on the same scale, we
can assess the stability of a split. If there is a dominating covariate with a clear
optimum, the split will be stable. On the other hand the presence of competing
splits in the range of the optimal split indicates possible instability. Mountain
plots also show which regions of competing variables are in the vicinity of the
optimum, thus allowing domain knowledge to be taken into account.
Visualizing Trees and Forests 15
The name “mountain” plots is derived from the fact that the plots usually
resemble a profile of a mountain range. They are mainly useful for assessing the
quality of a split along with potential competing splits. This information can
be used to interactively influence the tree construction process or to construct
multiple tree models and compare their behavior.
3 Visualizing Forests
So far we have been discussing visualization of individual tree models. We
have shown, however, that there is an inherent volatility in the choice of
splits that may affect the stability of a given model. Therefore it is useful to
grow multiple trees. In the following we will briefly introduce tree ensemble
methods and present visualization methods for forests consisting of multiple
tree models.
There are two main approaches of generating different tree models by
changing:
• training data - changes in the training data will produce different models
if the original tree was unstable. Bootstrapping is an useful technique to
assess the variability of the model fitting process.
• splits - allow locally suboptimal splits that create different partitions in
order to prevent the greedy algorithm from getting stuck in a local opti-
mum, which may not necessarily be a global optimum.
Model ensemble methods leverage the instability of individual models to
improve prediction accuracy by constructing a predictor as an aggregate of
multiple individual models. Bagging (?) uses bootstrapping to obtain many
tree models and combines their prediction results by aggregation: majority
voting for classification trees and averaging for regression trees. In addition,
random forests (?) add randomness by choosing candidate split variables from
a different random set in each node.
UCS BNi CTh UCH BCn MAn NNi SECS Mts UCS UCH BNi CTh BCn NNi MAn SECS Mts
weight: frequency weight: cummulated deviance gain
Fig. 10. Left: frequency of the use of individual variables in 20 bootstrapped tree
models. Right: cumulated deviance gain in splits featuring the corresponding vari-
able.
the left plot of Fig. 10. Each bar displays how often the corresponding vari-
able was used in the models. The most often used variable is UCS (20 times)
and the least often used variable is Mts which was used just once. Due to the
rather small number of variables to choose from there is no variable omitted
by all models.
Clearly this view is very coarse, because it doesn’t take into account which
role the variable plays in the models. The number of splits can double with
increasing depth, whereas the number of involved cases decreases. Therefore
the fact that a variable is used often does not necessarily mean that it is really
important, especially if used mainly in the fringe for small groups. Therefore
it is advisable to weight the contribution of each split by the a cumulative
statistic such as the decrease of impurity.
The cumulative value of impurity decrease for each variable of the 20
bootstrapped trees is displayed in the right plot of Fig. 10. The variables in
each plot are ordered by the bar height, representing their importance. We
see that UCS is by far the most influential variable, followed by UCH and BNi.
When making inference on the displayed information, we need to be cau-
tious and keep the tree properties in mind. Variable masking can heavily
influence the results of such analyses. Given two highly correlated variables,
it is very likely that they produce very similar split results. Therefore the
CART algorithm guided by the bootstrap will pick one of them at random.
Since the decision was made, the other variable is not likely to be used any-
more. If one of the variables is ‘weaker’, it will hardly appear in any model,
even though in the absence of the stronger variable it may still perform best
of all other variables.
In order to analyze that behavior, but also to see how different the tree
models are, it is necessary to take both the variable and the individual tree
into account. Two-dimensional weighted fluctuation diagrams showing trees
Visualizing Trees and Forests 17
and split variables is shown in Fig. 11. Variables are plotted on the y-axis,
the models on the x-axis. The area of each rectangle is proportional to the
cumulative impurity decrease of all splits using a specific variable in the tree
model. In general, fluctuation diagrams are useful for detecting patterns and
comparisons both in x and y directions.
Mts
SECS
MAn
NNi
BCn
CTh
BNi
UCH
UCS
t16 t07 t04 t02 t12 t18 t01 t19 t09 t10 t15 t08 t13 t14 t20 t11 t03 t05 t17 t06
Fig. 11. Fluctuation diagram of trees and variables, displaying cumulated deviance
gain of splits featuring that combination of tree and split variable.
Importance and use of variables in splits is just one aspect of the tree models
to consider. In section 2.2, we have discussed another way of visualizing trees
which allowed an assessment of cut point in the data context, sectioned scat-
terplots. Fortunately sectioned scatterplots can also be used for visualization
of forests, preferably using semi-transparent partition boundaries.
1500
1400
1300
1200
1100
1000
900
800
700
600
500
400
50 100 150 200 250
The aim of a trace plot is to provide a plot that allows comparison of arbitrary
many trees with respect to splits, cut points and the hierarchical structure.
This is not possible using any of the visualization methods described so far.
The basis of the trace plot is a rectangular grid consisting of split variables
as columns and node depths as rows. Each cell in this grid represents a possible
tree node. In order to distinguish actual split points, each cell contains a glyph
representing possible split points. For continuous variables it consists of a
horizontal axis and a split point is represented by a tick mark. Categorical
variables are shown as boxes corresponding to possible split combinations.
Each two adjacent inner nodes are connected by an edge between their split
points.
A classification tree and its trace plot is shown in Fig. 13. The root node
features a split on the variable palmitoleic, which is represented by the right-
most column. Its child nodes use splits on the variables linoleic and oleic,
hence the two edges leading from the root node to the next row of splits.
There are no further inner nodes as children of the linoleic split, therefore the
branch ends there. Analogously, all inner nodes are drawn in the trace plot
until terminal nodes are reached.
It is evident that all splits of the tree can be reconstructed from its repre-
sentation in the trace plot, because every cut point is shown in the trace plot.
Equally, it is possible to reconstruct the hierarchical structure of the tree due
to the presence of edges in the trace plot.
Moreover, the trace plot removes an ambiguity known from hierarchical
views: the order of the children nodes is irrelevant for the model, whereas
swapping left and right children in the hierarchical view produces quite differ-
ent hierarchical plots. In a trace plot the order of the children nodes is defined
by the grid and therefore fixed for all trees in the plot.
One important advantage of trace plots is the ability to display multiple
tree models simultaneously, superimposing all models on the same grid. A
trace plot of 100 bootstrapped classification trees is shown in Fig. 14. This
confirms the ability of bootstrapping to produce models that deviate from
certain local optima.
In order to prevent overplotting, we use semi-transparent edges. Conse-
quently, often used paths are more opaque than infrequently used paths. We
can clearly see that the first split always uses the palmitoleic variable. In the
next step, however, there are several alternatives for the splits. Some patterns
seem to be repeated further down the tree, indicating a rather stable subgroup
that can be reached by several different ways along the tree. In this particular
example we can recognize substructures that affirm the partial stability of the
tree models.
The remaining instability in this particular example is in most cases given
by the sequence in which the subgroups are separated. This is partially due
to the fact that we are dealing with a multi-class problem, thus the reduction
20 Simon Urbanek
palmitoleic
South-Apulia
Inland-Sardinia Umbria
Calabria South-Apulia
>=1055 < 1055 < 1043.5 >=1043.5
oleic eicosenoic
Inland-Sardinia West-Liguria
North-Apulia Umbria
< 7261 >=7261 >=9 <9
palmitoleic arachidic
Sicily West-Liguria
Coast-Sardinia Inland-Sardinia
>=95 < 95 >=25 < 25
arachidic stearic
Calabria Sicily
East-Liguria West-Liguria
< 80.5 >=80.5 < 262 >=262
palmitoleic
palmitoleic
4 Conclusion
Tree models are very rich and versatile. Equally rich is the variety of pos-
sible visualization techniques that provide various views of trees, each time
shedding light on different properties of the models.
Hierarchical views are the most commonly used graphical representations
and highlight the sequence of splits. They are easy to interpret even by un-
trained personnel. Node placement and representation can convey additional
information associated with the model or data. Size of nodes can be intuitively
associated with the size of the data passed into that node. Highlighting and
brushing is easily possible in this context, which facilitates interpretation in
conjunction with available data. Hierarchical views often allow for additional
annotation and supplemental information, such as split quality. Complemental
methods are available for large trees and data, such as censored or context-
preserving local zoom.
A less known group of tree model visualization techniques are those based
on the recursive partitioning aspect. Direct view of the partition boundaries in
22 Simon Urbanek
the observation space can be obtained using sectioned scatterplots. The focus
here lies on the cut points and their relative position in the data space. They
are limited in terms of the number and types of covariates used, but prove to
be useful as a drill-down technique for local analysis of subgroups throughout
the tree model.
Other methods based on recursive partitioning of the plot space are
treemaps and spineplots of leaves. Both allow a concise view of all termi-
nal nodes while retaining hints of the splitting sequence. In conjunction with
highlighting and brushing the main focus here is on the model behavior with
respect to data points. As such the plots can be created using training and test
data separately and compared. Treemaps are more suitable for absolute com-
parisons and large, complex trees, whereas spineplots of leaves can be used
for relative comparison of groups within terminal nodes up to moderately
complex trees.
Tree models are possibly unstable, that is small changes in the data can
lead to entirely different trees. In order to analyze the stability of splits it
is possible to visualize the optimality criterion for candidate variables using
mountain plots. Competing splits within a variable become clearly visible and
the comparison of mountain plots of multiple candidate variables allows a
quick assessment of the magnitude and cause for potential instability.
The instability of a tree model can be used to obtain additional insight in
the data and to improve prediction accuracy. Bootstrapping provides a useful
method for the analysis of model variation by creating a whole set of tree
models. Visualization of the use of covariates in the splits as weighted bar-
charts with aggregate impurity criterion as weight allows quick assessment of
variable importance. Variable masking can be detected using weighted fluc-
tuation diagrams of variables and trees. This view is also useful for finding
groups of related tree models.
Sectioned scatterplots also allow the visualization of partition boundaries
for multiple trees. The resulting plot can no longer be used for global drill-
down due to the lack of shared subgroups, but it provides a way of analyzing
the ‘fuzziness’ of a cut-point in conjunction with the data.
Finally, trace plots allow us to visualize split rules and the hierarchical
structure of arbitrary many trees in a single view. They are based on a grid of
variables and tree levels (nodes of the same depth) where each cell corresponds
to a candidate split variable, corresponding to a potential tree node. Actually
used cells are connected in the same way as in the hierarchical view, thus
reflecting the full structure of the tree. Multiple trees can be superimposed
on this grid, each leaving its own ‘trace’. The resulting plot shows frequently
used paths, common subgroups and alternate splits.
All plots in this chapter have been produced using R software for statistical
computing and KLIMT interactive software for visualization and analysis of
trees and forests. Visualization methods presented in this chapter are suitable
for both presentation of particular findings and exploratory work. The indi-
vidual techniques complement each other well by providing various different
Visualizing Trees and Forests 23
viewpoints on the models and data. Therefore they can be successfully used
in an interactive framework. Trace plots, for example, represent a very useful
overview which can be linked to individual hierarchical views. Subgroups de-
fined by cells in the trace plot can be linked to data-based plots, its edges to
sectioned scatterplots.
The methods presented here were mostly illustrated on classification exam-
ples, but they can be equally used for regression trees and mostly for survival
trees as well. Also all methods described here are not limited to binary trees,
even though those represent the most commonly used models. The variety of
tree models and further development of ensemble methods still leaves room
for enhancements or new plots. For exploratory work it is of benefit to have a
big toolbox to choose from, for presentation graphics it is important to have
the ability to display the ‘key point’ we want to convey.