0% found this document useful (0 votes)

100 views21 pages

Classification: 12.1 Discriminant Analysis

This chapter discusses classification and discriminant analysis. Classification involves allocating future cases to predefined groups or classes based on their characteristics. Discriminant analysis uses characteristics of known cases to reveal structure between classes and make predictions about future cases. Two main approaches are linear discriminant analysis, which seeks linear combinations of variables that maximize separation between class means, and quadratic discriminant analysis, which assumes normal distributions for each class. The chapter outlines these methods and their implementation, and discusses how to choose the number of discriminants or components to use. It also compares discriminant analysis to other classification techniques like logistic regression and decision trees.

Uploaded by

Мирјана Мићевић

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views21 pages

Classification: 12.1 Discriminant Analysis

Uploaded by

Мирјана Мићевић

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter 12

Classification

is an increasingly important application of modern methods in

statistics. In the statistical literature the word is used in two distinct senses. The
entry (Hartigan, 1982) in the original Encyclopedia of Statistical Sciences uses the
sense of cluster analysis discussed in Section 11.2. Modern usage is leaning to the
other meaning (Ripley, 1997) of allocating future cases to one of

sense. (The older statistical literature sometimes refers to this as allocation.)

In pattern-recognition terminology this chapter is about supervised methods.
The classical methods of multivariate analysis (Krzanowski, 1988; Mardia, Kent
and Bibby, 1979; McLachlan, 1992) have largely been superseded by methods
from pattern recognition (Ripley, 1996; Webb, 1999; Duda et al., 2001), but some
still have a place.
It is sometimes helpful to distinguish discriminant analysis in the sense of
describing the differences between the

tion; the second can be a ‘black box’ that makes a decision without any explana-
tion. In many applications no explanation is required (no one cares how machines
read postal (zip) codes, only that the envelope is correctly sorted) but in others,
especially in medicine, some explanation may be necessary to get the methods
adopted.
data mining, although some of data mining is ex-
ploratory in the sense of Chapter 11. Hand et al. (2001) and (especially) Hastie
et al. (2001) are pertinent introductions.
Some of the methods considered in earlier chapters are widely used for clas-
n trees, logistic regression for groups and
multinomial log-linear models (Section 7.3) for groups.

12.1 Discriminant Analysis

Suppose that we have a set of classes, and for each case we know the class
(assumed correctly). We can then use the class information to help reveal the
structure of the data. Let denote the within-class covariance matrix, that is the
covariance matrix of the variables centred on the class mean, and denote the

331
332

between-classes covariance matrix, that is, of the predictions by the class means.
Let be the matrix of class means, and be the matrix of class
indicator variables (so if and only if case is assigned to class ). Then
the predictions are . Let be the means of the variables over the whole
sample. Then the sample covariance matrices are

(12.1)

Note that has rank at most .

Fisher (1936) introduced a linear discriminant analysis seeking a linear com-
bination of the variables that has a maximal ratio of the separation of the class
means to the within-class variance, that is, maximizing the ratio .
To compute this, choose a sphering (see page 305) of the variables so that
they have the identity as their within-group correlation matrix. On the rescaled
variables the problem is to maximize subject to , and as we saw
for PCA, this is solved by taking to be the eigenvector of corresponding to
the largest eigenvalue. The linear combination is unique up to a change of sign
(unless there are multiple eigenvalues). The exact multiple of returned by a

the conventional divisor of , but divisors of and have been used.

As for principal components, we can take further linear components corre-
sponding to the next largest eigenvalues. There will be at most
positive eigenvalues. Note that the eigenvalues are the proportions of the between-
classes variance explained by the linear combinations, which may help us to
choose how many to use. The corresponding transformed variables are called the
linear discriminants or canonical variates. It is often useful to plot the data on the

should be the identity, we chose an equal-scaled plot. (Using

will give this plot without the colours.) The linear discriminants are convention-
ally centred to have mean zero on dataset.
12.1 Discriminant Analysis 333

5
second linear discriminant
v v
v s
ss s
v vvvvv c ss ss
v vvvvvvvvvvvv vvc ccccccc s s ssss ss
v v v v c cc c s
s s
vvv vvvvv cc cc cc c sss s s sss ss s

0
v vvvvvc cv cc cccc s s ss ssss
s s s
v c cc c
v cc cc c s s s
v v c c cc c c
cc
s
c

-5

-5 0 5 10
first linear discriminant

Figure 12.1: The log

axis. Using

The approach we have illustrated is the conventional one, following Bryan

at (12.1) weights the
groups by their size in the dataset. Rao (1948) used the unweighted covariance
matrix of the group means, and our software uses a covariance matrix weighted

Discrimination for normal populations

An alternative approach to discrimination is via probability models. Let de-
note the prior probabilities of the classes, and the densities of distribu-
tions of the observations for each class. Then the posterior distribution of the
classes after observing is

(12.2)

and it is fairly simple to show that the allocation rule which makes the smallest
expected number of errors chooses the class with maximal ; this is known
as the Bayes rule. (We consider a more general version in Section 12.2.)
334

Now suppose the distribution for class is multivariate normal with mean
and covariance . Then the Bayes rule minimizes

(12.3)

Mahalanobis distance to the class centre,

and can be calculated by the function . The difference between
the for two classes is a quadratic function of , so the method is known as
quadratic discriminant analysis and the boundaries of the decision regions are
quadratic surfaces in space. This is implemented by our function .
Further suppose that the classes have a common covariance matrix . Dif-
ferences in the are then linear functions of , and we can maximize
or
(12.4)
To use (12.3) or (12.4) we have to estimate and or . The obvious
estimates are used, the sample mean and covariance matrix within each class, and
for .
How does this relate to Fisher’s linear discrimination? The latter gives new
variables, the linear discriminants, with unit within-class sample variance, and the
differences between the group variables. Thus on
these variables the Mahalanobis distance (with respect to ) is just

components of the vector depend on . Similarly, on these

variables

and we can work in dimensions. If there are just two classes, there is a single
linear discriminant, and

const

rescaled to unit length.

Note that linear discriminant analysis uses a that is a logistic regres-
sion for and a multinomial log-linear model for . However, it
differs from the methods of Chapter 7 in the methods of parameter estimation
used. Linear discriminant analysis will be better if the populations really are mul-
tivariate normal with equal within-group covariance matrices, but that superiority
is fragile, so the methods of Chapter 7 are usually preferred .

dataset
Can we construct a rule to predict the sex of a future Leptograpsus crab of un-
known colour form (species)? We noted that is measured differently for males
12.1 Discriminant Analysis 335

and females, so it seemed prudent to omit it from the analysis. To start with, we ig-
nore the differences between the forms. Linear discriminant analysis, for what are
,
a dimensionally neutral quantity. Six errors are made, all for the blue form:

It does make sense to take the colour forms into account, especially as the
within-group distributions look close to joint normality (look at the Figures 4.13

between-group variation; Figure 12.2 shows the data on those variables.

We cannot represent all the decision surfaces exactly on a plot. However,

proximation; see Figure 12.2.

336

O
OO
O
O

4
OO OO O
B O O O O
O O OO O O O
B OO O O
B B O O OOO
O O O
B B O OOOO
B BB B BB BB O O

2
B B OO O O
BB O
B BBBB B OO
BBB O
BB o
B B B oo
B B B O
BB B B b

Second LD
B
B B

0
o
B B BB o o
bb B o o o o
B o o
b o o o oo
Bb b b ooo
b bb o
o
ooo o
B b b o o
b
b b bb o o ooo o oo
b b bb b b b o o

-2
b o o o o
b b b bb b o o o
b b o
b
b bb bb o o
bb b b
b o o
-4

b
-6 -4 -2 0 2 4 6
First LD

Figure 12.2: Linear discriminants for the data. Males are coded as capitals, fe-
males as lower case, colours as the initial letter of blue or orange. The crosses are the
group means for a linear discriminant for sex (solid line) and the dashed line is the deci-
sion boundary for sex based on four groups.

The reader is invited to try quadratic discrimination on this problem. It per-

forms very marginally better than linear discrimination, not surprisingly since the
covariances of the groups appear so similar, as can be seen from the result of

Robust estimation of multivariate location and scale

We may wish to consider more robust estimates of (but not ). Somewhat

component of a multivariate mean (Rousseeuw and Leroy, 1987, p. 250), and it is

easier to consider the estimation of mean and variance simultaneously.
Multivariate variances are very sensitive to outliers. Two methods for robust
1
covariance estimation are available via our function and the S-PLUS
functions and (Rousseeuw, 1984; Rousseeuw and Leroy,
1987) and in library section . Suppose there are observations
of variables. The minimum volume ellipsoid method seeks an ellipsoid contain-
ing points that is of minimum volume, and the minimum
covariance determinant method seeks points whose covariance has minimum

1 Adopted by R in package .
12.1 Discriminant Analysis 337

15
CC
F H

2
FV V
V F HHHHHH H
F FFF V FVF FNF H
T F H
N F H
F F FFNFFFNF

10
N
FN N FT H
FFF NNFFFFFNFFN T
C F NVF
FVN
F NN HT NT
T HH H
NN
NNF FV
NFF
V
NF FF
F
NNF
N
N
N
NNF
F
N
T HH
H H N
0 NNFVNNFFN
FFNVF
VN
NNN NNN N
N VVNF FF FNFV
NVN
FF NN N
N H HH
N
N
F NNN N N N
FN
N H HH
N
N
N
N H T C N
N C C C
C C C C

5
T
LD1

LD1
N CN C N C C
-2

C H N
CN C C N NTT
V N C T
N CN
N H C
N N T
N HFFNN N
N
NFF
V FFF VFFN T T
NFNN HH
-4

F
F
VF
FF
FF
FF
F
F
NNFVVNN T

0
N F
N F
FFF
V
NNN T H
H
H
NVF
NN
F
V VF
FNF
F
V
NV
F
N
F
N
VN
N
NN
N
NN
F NN
NN
FF
FN
V H H N
N FNNNN
C
FN
N
N
FNF N
N
VVVVNN
F
F
FNF T H
H
F N N
H HH H
N FH H
HHF
H H
-6

H
H

-5
C
C
C
-8

-4 -2 0 2 4 6 -10 -5 0 5 10 15
LD2 LD2

Figure 12.3: The

robust estimation of the common covariance matrix.

Our function implements both.

The search for an MVE or MCD provides points whose mean and variance

those points whose Mahalanobis distance from the initial mean using the initial
lly within the 97.5% point under normality),
and returning their mean and variance matrix).
An alternative approach is to extend the idea of M-estimation to this setting,
distribution for a small number of degrees of free-
dom. This is implemented in our function ; the theory behind the algo-
rithm used is given in Kent, Tyler and Vardi (1994) and Ripley (1996). Normally
is faster than , but it lacks the latter’s extreme resistance. We
can use linear discriminant analysis on more than two classes, and illustrate this
with the forensic glass dataset .
Our function has an argument to use the minimum
volume ellipsoid estimate (but without robust estimation of the group centres) or
the multivariate distribution by setting . This makes a consid-
erable difference for the forensic glass data, as Figure 12.3 shows. We use
the default .
338

Try , which gives an almost linear plot.

In the terminology of pattern recognition the given examples together with their
training set, and future cases form the test set. Our
primary measure of success is the error (or
would obtain (possibly seriously) biased estimates by re-classifying the training
set, but that the error rate on a test set randomly chosen from the whole population
will be an unbiased estimator.
It may be helpful to know the type of errors made. A confusion matrix gives
the number of cases with true class . In some problems
some errors are considered to be worse than others, so we assign costs to
allocating a case of class to class . Then we will be interested in the average
error cost rather than the error rate.
It is fairly easy to show (Ripley, 1996, p. 19) that the average error cost is
minimized by the Bayes rule, which is to allocate to the class minimizing
where is the posterior distribution of the classes after ob-
serving . If the costs of all errors are the same, this rule amounts to choosing
the class with the largest posterior probability . The minimum average
cost is known as the Bayes risk. We can often estimate a lower bound for it by the
method of Ripley (1996, pp. 196–7) (see the example on page 347).
We saw in Section 12.1 how can be computed for normal popula-
tions, and how estimating the Bayes rule with equal error costs leads to lin-
ear and quadratic discriminant analysis. As our functions and

tion with error costs.

The posterior probabilities may also be estimated directly. For just
two classes we can model . For

this using a surrogate log-linear Poisson GLM model (Section 7.3), but using the
function in library section will usually be faster and easier.
directly by a special multiple logistic
model, one in which the right-hand side is a single factor specifying which leaf
the case will be assigned to by the tree. Again, since the posterior probabilities
339

are given by the method it is easy to estimate the Bayes rule for unequal
error costs.

Predictive and ‘plug-in’ rules

rule we need to know the posterior probabilities . Since these are unknown
we use an explicit or implicit parametric family . In the methods con-
sidered so far we act as if were the actual posterior probabilities, where
is an estimate computed from the training set , often by maximizing some ap-
propriate likelihood. This is known as the ‘plug-in’ rule. However, the ‘correct’
estimate of is (Ripley, 1996, 2.4) to use the predictive estimates

(12.5)

If we are very sure of our estimate there will be little difference between
and ; otherwise the predictive estimate will normally be less
extreme (not as near 0 or 1). The ‘plug-in’ estimate ignores the uncertainty in the
parameter estimate which the predictive estimate takes into account.
It is not often possible to perform the integration in (12.5) analytically, but it
is possible for linear and quadratic discrimination with appropriate ‘vague’ pri-
ors on (Aitchison and Dunsmore, 1975; Geisser, 1993; Ripley, 1996). This
estimate is implemented by of the meth-
ods for our functions and . Often the differences are small, especially
for linear discrimination, provided there are enough data for a good estimate of
the variance matrices. When there are not, Moran and Murphy (1979) argue
that considerable improvement can be obtained by using an unbiased estimator
of , implemented by the argument .

A simple example: Cushing’s syndrome

We illustrate these methods by a small example taken from Aitchison and Dun-
smore (1975, Tables 11.1–3) and used for the same purpose by Ripley (1996).
The data are on diagnostic tests on patients with Cushing’s syndrome, a hy-
persensitive disorder associated with over-secretion of cortisol by the adrenal
gland. This dataset has three recognized types of the syndrome represented as
a, b, c. (These encode ‘adenoma’, ‘bilateral hyperplasia’ and ‘carcinoma’, and
represent the underlying cause of over-secretion. This can only be determined
histopathologically.) The observations are urinary excretion rates (mg/24 h) of
the steroid metabolites tetrahydrocortisone and pregnanetriol, and are considered
on log scale.
There are six patients of unknown type (marked u), one of whom was later
found to be of a fourth type, and another was measured faultily.
and the various options
of quadratic discriminant analysis. This was produced by
340

LDA QDA
a a
cc c cc c

5.00

5.00
u u
c b c b
c c

Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb
u u u u
0.50

0.50
b b b b
a ub b a ub b
b b
a u u a u u
0.05

0.05
a a
5 10 50 5 10 50
Tetrahydrocortisone Tetrahydrocortisone

QDA (predictive) QDA (debiased)

a a
cc c cc c
5.00

5.00
u u
c b c b
c c
Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb
u u u u
0.50

0.50
b b b b
a ub b a ub b
b b
a u u a u u
0.05

0.05
a a
5 10 50 5 10 50
Tetrahydrocortisone Tetrahydrocortisone

Figure 12.4: Linear and quadratic discriminant analysis applied to the Cushing’s syndrome
data.

(Function is given in the scripts.)

We can contrast these with logistic discrimination performed by

(Function is given in the scripts.) When, as here, the classes have

quite different variance matrices, linear and logistic discrimination can give quite
different answers (compare Figures 12.4 and 12.5).
12.3 Non-Parametric Rules 341

a a
cc c cc c

2
5.00
u
b
u
b
c
c c c c

1
Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb

0
u u a u u

0.50
b b b b
a ub b a ub b

-1
b b b

-2
a u u a u u
0.05

-3
a a
5 10 50 1 2 3 4
Tetrahydrocortisone Tetrahydrocortisone

Figure 12.5
data.

in Figure 12.5.

Mixture discriminant analysis

Another application of the (plug-in) theory is mixture discriminant analysis
(Hastie and Tibshirani, 1996) which has an implementation in the library sec-
tion ture distributions to each class and
then applies (12.2).

12.3 Non-Parametric Rules

mates of the class densities or of the log posterior. Library section imple-
ments the
tler, 1982; Ripley, 1996) and learning vector quantization (Kohonen, 1990, 1995;
nearest examples in some
reference set, and taking a majority vote among the classes of these examples,
or, equivalently, estimating the posterior probabilities by the proportions
of the classes among the examples.
The methods differ in their choice of reference set. The -nearest neighbour
methods use the whole training set or an edited subset. Learning vector quantiza-
tion is similar to K-means in selecting points in the space other than the training
342

1-NN 3-NN

a a
c c c c
c c

5.00

5.00
u u
c b c b
c c
b b b b
Pregnanetriol

Pregnanetriol
a b a b
a b a b
u u u u
b b b b
0.50

0.50
a ub b a ub b

b b

a u u a u u
0.05

0.05
a a

5 10 50 5 10 50
Tetrahydrocortisone Tetrahydrocortisone

Figure 12.6: -nearest neighbours applied to the Cushing’s syndrome data.

set examples to summarize the training set, but unlike K-means it takes the classes
of the examples into account.
These methods almost always measure ‘nearest’ by Euclidean distance. For
the Cushing’s syndrome data we use Euclidean distance on the logged covariates,
rather arbitrarily scaling them equally.

This dataset is too small to try the editing and LVQ methods in library section
.

12.4 Neural Networks

ear extension of multiple logistic re-

gression, as we saw in Section 8.10. We can consider them for the Cushing’s
syndrome example by the following code.2

2 S envi-
ronment.
12.4 Neural Networks 343

Size = 2 Size = 2, lambda = 0.001

a a
cc c cc c

5.00

5.00
u u
c b c b
c c
Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb
u u u u
0.50

0.50
b b b b
a ub b a ub b
b b
a u u a u u
0.05

0.05
a a
5 10 50 5 10 50

Tetrahydrocortisone Tetrahydrocortisone

Size = 2, lambda = 0.01 Size = 5, 20 lambda = 0.01

a a
cc c cc c
5.00

5.00
u u
c b c b
c c
Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb
u u u u
0.50

0.50
b b b b
a ub b a ub b
b b
a u u a u u
0.05

0.05
a a

5 10 50 5 10 50
Tetrahydrocortisone Tetrahydrocortisone

Figure 12.7: Neural networks applied to the Cushing’s syndrome data. Each panel shows

The results are shown in Figure 12.7. We see that in all cases there are multiple

Once we have a penalty, the choice of the number of hidden units is often not
critical (see Figure 12.7). The spirit of the predictive approach is to average the
predicted
344

Many local maxima Averaged

a a
cc c cc c

5.00

5.00
u u
c b c b
c c
Pregnanetriol

Pregnanetriol
a b b a b b
a bb a bb
u u u u
0.50

0.50
b b b b
a ub b a ub b
b b
a u u a u u
0.05

0.05
a a
5 10 50 5 10 50
Tetrahydrocortisone Tetrahydrocortisone

Figure 12.8: Neural networks with three hidden units and applied to the Cush-
ing’s syndrome data.

Note that there are two quite different types of local maxima occurring here, and
some local maxima occur several times (up to convergence tolerances). An aver-

12.5 Support Vector Machines

They have been promoted enthusiastically, but with little respect to the selection
effects of choosing the test problem and the member of the large class of classi-
et al. (1992); Cortes and Vapnik
(1995); Vapnik (1995, 1998); the books by Cristianini and Shawe-Taylor (2000)
and Hastie et al. (2001, 4.5, 12.2, 12.3) present the underlying theory.
The method for classes is fairly simple to describe. Logistic regression

points on one side and all class-two points on the other. It would be a coincidence
for there to be only one such hyperplane, a
in the middle of the ‘gap’ between

middle of the gap, that is with maximal margin (the distance from the hyperplane
12.5 Support Vector Machines 345

to the nearest point). This is quadratic programming problem that can be solved
by standard methods.3 Such a hyperplane has support vectors, data points that are
exactly the margin distance away from the hyperplane. It will typically be a very

is tackled in two ways. First, we can allow some points to be on the wrong side
of their margin (and for some on the wrong side of the hyperplane) subject to

with Lagrange multiplier . This is still a quadratic programming problem,

because of the rather arbitrary use of sum of distances.
Second, the set of variables is expanded greatly by taking non-linear functions
of the original set of variables. Thus rather than seeking a classifying hyperplane
, we seek for a vector of
functions
to solving

where
not dissimilar (Hastie et al., 2001, p. 380) to a logistic regression with weight
). The claimed advantage of SVMs is

There is an implementation of SVMs for R in function in package

.4 The default values do not do well, but after some tuning for the
data we can get a good discriminant with 21 support vectors. Here is
and .

We can try a 10-fold cross-validation by

3 See Section
16.2 for S software for this problem; however, special-purpose software is often used.
4 Codeby David Meyer based on C++ code by Chih-Chung Chang and Chih-Jen Lin. A port to
S-PLUS is available for machines with a C++ compiler.
346

The extension to classes is much less elegant, and several ideas have
been used. The function uses one attributed to Knerr et al. (1990) in which
of classes, and the majority vote amongst
the resulting

12.6 Forensic Glass Example

The forensic glass dataset has 214 points from six classes with nine mea-

seen (Figures 4.17 on page 99, 5.4 on page 116, 11.5 on page 309 and 12.3 on
page 337) the types of glass do not form compact well-separated groupings, and
the marginal distributions are far from normal. There are some small classes (with
9, 13 and 17 examples), so we cannot use quadratic discriminant analysis.
We assess their performance by 10-fold cross-validation, using the same ran-
dom partition for all the methods. Logistic regression provides a suitable bench-
mark (as is often the case), and in this example linear discriminant analysis does
equally well.
12.6 Forensic Glass Example 347

We can use nearest-neighbour methods to estimate the lower bound on the Bayes
risk as about 10% (Ripley, 1996, pp. 196–7).

this dataset. We need to cross-validate over the choice of tree size, which does
vary by group from four to seven.
348

Neural networks
We wrote some general functions for testing neural network models by -fold
cross-validation. First we rescale the dataset so the inputs have range .

amount of weight decay by an inner cross-validation. To do so we wrote a fairly

the scripts for the code.)

the PC).
This code chooses between neural nets on the basis of their cross-validated
error rate. An alternative is to use logarithmic scoring, which is equivalent to

class is correct and 1 otherwise, we count for the true class . We

can easily code this variant by replacing the line

in .
Support vector machines

The following is faster, but not strictly comparable with the results above, as a
different random partition will be used.
12.7 Calibration Plots 349

Learning vector quantization

For LVQ as for -nearest neighbour methods we have to select a suitable metric.
The following experiments used Euclidean distance on the original variables, but
the rescaled variables or Mahalanobis distance could also be tried.

We set an even prior over the classes as otherwise there are too few representatives
of the smaller classes. Our initialization code in follows Kohonen’s in
selecting the number of representatives; in this problem 24 points are selected,
four from each class.

The initialization is random, so your results are likely to differ.

12.7 Calibration Plots

One measure that a suitable model for has been found is that the predicted
probabilities are well calibrated; that is, that a fraction of about of the events
we predict with probability actually occur. Methods for testing calibration of
probability forecasts have been developed in connection with weather forecasts
(Dawid, 1982, 1986).
For the forensic glass example we are making six probability forecasts for
each case, one for each class. To ensure that they are genuine forecasts, we should
use the cross-validation procedure. A minor change to the code gives the proba-
bility predictions:
350

1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
predicted probability

Figure 12.9 data.

We can plot these and smooth them by

A smoothing method with an adaptive bandwidth such as is needed here,

as the distribution of points along the -axis can be very much more uneven
than in this example. The result is shown in Figure 12.9. This plot does show

one. Indeed, only 22/64 of the events predicted with probability greater than
0.9 occurred. (The underlying cause is the multimodal nature of some of the
underlying class distributions.)
12.7 Calibration Plots 351

use of plug-in rather than predictive estimates. Then the plot can be used to adjust
the probabilities (which may need further adjustment to sum to one for more than
two classes).

Linear Discriminant
No ratings yet
Linear Discriminant
25 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
20 pages
Macolor Worksheet2
No ratings yet
Macolor Worksheet2
3 pages
PR- Unit 2
No ratings yet
PR- Unit 2
17 pages
Analisis Klasifikasi
No ratings yet
Analisis Klasifikasi
41 pages
Where Can Buy Lust On Trial Censorship and The Rise of American Obscenity in The Age of Anthony Comstock Werbel Ebook With Cheap Price
100% (1)
Where Can Buy Lust On Trial Censorship and The Rise of American Obscenity in The Age of Anthony Comstock Werbel Ebook With Cheap Price
62 pages
AE - Tema 5 - Two-class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-class Fisher Discriminant Analysis
6 pages
Multivariate Analysis (Slides 8)
No ratings yet
Multivariate Analysis (Slides 8)
19 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
Week2_Part1_Summer_Partial_Notes
No ratings yet
Week2_Part1_Summer_Partial_Notes
75 pages
Lachenbruch EstimationErrorRates 1968
No ratings yet
Lachenbruch EstimationErrorRates 1968
12 pages
1948-65-HD-FL-Panhead-Parts-Catalog
No ratings yet
1948-65-HD-FL-Panhead-Parts-Catalog
86 pages
Document (6)
No ratings yet
Document (6)
6 pages
Linear - Classification
No ratings yet
Linear - Classification
72 pages
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
No ratings yet
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
20 pages
Objectives:: Linear Discriminant Analysis
No ratings yet
Objectives:: Linear Discriminant Analysis
10 pages
Analisis Diskriminan 2
No ratings yet
Analisis Diskriminan 2
30 pages
LINFO2275 Questions d Examen-4
No ratings yet
LINFO2275 Questions d Examen-4
34 pages
Discriminant analysis
No ratings yet
Discriminant analysis
20 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
28 pages
Parametric Classification PDF
No ratings yet
Parametric Classification PDF
46 pages
1906.02590v1
No ratings yet
1906.02590v1
16 pages
Cours FLD
No ratings yet
Cours FLD
28 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
Generative Algorithms
No ratings yet
Generative Algorithms
3 pages
Multiple Discriminant Analysis: Dr. Hemal Pandya
No ratings yet
Multiple Discriminant Analysis: Dr. Hemal Pandya
29 pages
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
No ratings yet
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
11 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Inf2b Learn Note10 2up
No ratings yet
Inf2b Learn Note10 2up
7 pages
Psy524 Lecture 16 Discrim1
No ratings yet
Psy524 Lecture 16 Discrim1
39 pages
Discriminanat Analysis
No ratings yet
Discriminanat Analysis
13 pages
Call For Papers - NCJILP - Volume IV-1
No ratings yet
Call For Papers - NCJILP - Volume IV-1
7 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
Classification Models
No ratings yet
Classification Models
95 pages
Discriminant Function Analysis: Basics Psy524 Andrew Ainsworth
No ratings yet
Discriminant Function Analysis: Basics Psy524 Andrew Ainsworth
39 pages
Chapter 1 the Business World and Business Management (1).Ppt
No ratings yet
Chapter 1 the Business World and Business Management (1).Ppt
17 pages
Lec 6
No ratings yet
Lec 6
14 pages
Chap12 DiscriminantAnalysis
No ratings yet
Chap12 DiscriminantAnalysis
30 pages
0 - Bethune College - IDC Syllabus of All Department - 230810 - 194239
No ratings yet
0 - Bethune College - IDC Syllabus of All Department - 230810 - 194239
5 pages
MATERIAL 5-Discriminant PDF
No ratings yet
MATERIAL 5-Discriminant PDF
26 pages
Computer_science_paper_1__TZ1_HL
No ratings yet
Computer_science_paper_1__TZ1_HL
11 pages
17 SPEAKING TOPICS
No ratings yet
17 SPEAKING TOPICS
5 pages
1364 English Teacher Interview Questions Answers Guide
No ratings yet
1364 English Teacher Interview Questions Answers Guide
9 pages
DFA Interpretation Help
No ratings yet
DFA Interpretation Help
36 pages
MAS 408 - Discriminant Analysis
No ratings yet
MAS 408 - Discriminant Analysis
7 pages
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
2 pages
Steel Girder
No ratings yet
Steel Girder
42 pages
Discriminant Analysis: Plot of Y X. Symbol Is Value of GROUP
No ratings yet
Discriminant Analysis: Plot of Y X. Symbol Is Value of GROUP
8 pages
Discriminant Analysis Psy.
No ratings yet
Discriminant Analysis Psy.
5 pages
The Mathematics of Decisions, Elections, and Games
No ratings yet
The Mathematics of Decisions, Elections, and Games
242 pages
Empirical Data Analysis in Accounting and Finance
No ratings yet
Empirical Data Analysis in Accounting and Finance
37 pages
Presentation On Strategic Management by Fred R. Davids
No ratings yet
Presentation On Strategic Management by Fred R. Davids
45 pages
Methods of Printing
No ratings yet
Methods of Printing
11 pages
Discriminant Analysis For Risk Classification and Prediction
No ratings yet
Discriminant Analysis For Risk Classification and Prediction
23 pages
Discrimination and Classification
No ratings yet
Discrimination and Classification
7 pages
Natures - ULtra Fast Light Field Micros
No ratings yet
Natures - ULtra Fast Light Field Micros
4 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Invoice 0126-22 - Teraju Global Vision SDN BHD (Big Tas Tea Bandar Tropicana Aman)
No ratings yet
Invoice 0126-22 - Teraju Global Vision SDN BHD (Big Tas Tea Bandar Tropicana Aman)
1 page
Learning Activities: Activity 1. Directions: Answer The Following Questions Below: Write Your Answer Inside The Box
No ratings yet
Learning Activities: Activity 1. Directions: Answer The Following Questions Below: Write Your Answer Inside The Box
3 pages
Greenacre c11 2010
No ratings yet
Greenacre c11 2010
11 pages
Determinants and Matrices
From Everand
Determinants and Matrices
A. C. Aitken
3/5 (1)
Discriminant Analysis Example 2: Fisher's Iris Data
No ratings yet
Discriminant Analysis Example 2: Fisher's Iris Data
12 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Best Practice:: Siemens' Development Value Proposition
No ratings yet
Best Practice:: Siemens' Development Value Proposition
16 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Two Group Discriminant Function Analysis
No ratings yet
Two Group Discriminant Function Analysis
4 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
QP-STD-Q-004 R1 Quality Reqts For Projects
100% (7)
QP-STD-Q-004 R1 Quality Reqts For Projects
49 pages
Mmel 90
No ratings yet
Mmel 90
54 pages
06 1021 Introduction To Turtle Graphics s2019 9spp BW
No ratings yet
06 1021 Introduction To Turtle Graphics s2019 9spp BW
5 pages
Fernandez Del Rio Et Al 2020
No ratings yet
Fernandez Del Rio Et Al 2020
6 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
ProMatura Brochure
No ratings yet
ProMatura Brochure
16 pages
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
No ratings yet
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
22 pages
A Society That Looks Back in Anger Studying John Osborne
No ratings yet
A Society That Looks Back in Anger Studying John Osborne
6 pages
Unit 4
No ratings yet
Unit 4
76 pages
Neuroscience PDF
No ratings yet
Neuroscience PDF
4 pages
2020.12.29 CTW320-CGD-AIC-000659-E01-Subm.-MAR Steel Pipe Painting (Red Oxide Primer & Gloss Top Coat (let-AIC-002289)
No ratings yet
2020.12.29 CTW320-CGD-AIC-000659-E01-Subm.-MAR Steel Pipe Painting (Red Oxide Primer & Gloss Top Coat (let-AIC-002289)
6 pages
Properties of Equality and Congruence
No ratings yet
Properties of Equality and Congruence
18 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Exercises of Vectors and Vectorial Spaces
From Everand
Exercises of Vectors and Vectorial Spaces
Simone Malacrida
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
SOP-04 Preventive Maintenance of DG Sets
100% (1)
SOP-04 Preventive Maintenance of DG Sets
11 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Grasshopper
No ratings yet
Grasshopper
111 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Classification: 12.1 Discriminant Analysis

Uploaded by

Classification: 12.1 Discriminant Analysis

Uploaded by

Chapter 12

is an increasingly important application of modern methods in

sense. (The older statistical literature sometimes refers to this as allocation.)

12.1 Discriminant Analysis

Note that has rank at most .

the conventional divisor of , but divisors of and have been used.

should be the identity, we chose an equal-scaled plot. (Using

Figure 12.1: The log

The approach we have illustrated is the conventional one, following Bryan

Discrimination for normal populations

Mahalanobis distance to the class centre,

components of the vector depend on . Similarly, on these

rescaled to unit length.

between-group variation; Figure 12.2 shows the data on those variables.

We cannot represent all the decision surfaces exactly on a plot. However,

proximation; see Figure 12.2.

The reader is invited to try quadratic discrimination on this problem. It per-

Robust estimation of multivariate location and scale

component of a multivariate mean (Rousseeuw and Leroy, 1987, p. 250), and it is

Figure 12.3: The

Our function implements both.

Try , which gives an almost linear plot.

tion with error costs.

Predictive and ‘plug-in’ rules

A simple example: Cushing’s syndrome

QDA (predictive) QDA (debiased)

(Function is given in the scripts.)

(Function is given in the scripts.) When, as here, the classes have

Mixture discriminant analysis

12.3 Non-Parametric Rules

Figure 12.6: -nearest neighbours applied to the Cushing’s syndrome data.

12.4 Neural Networks

ear extension of multiple logistic re-

Size = 2 Size = 2, lambda = 0.001

Size = 2, lambda = 0.01 Size = 5, 20 lambda = 0.01

Many local maxima Averaged

12.5 Support Vector Machines

with Lagrange multiplier . This is still a quadratic programming problem,

There is an implementation of SVMs for R in function in package

We can try a 10-fold cross-validation by

12.6 Forensic Glass Example

amount of weight decay by an inner cross-validation. To do so we wrote a fairly

the scripts for the code.)

class is correct and 1 otherwise, we count for the true class . We

Learning vector quantization

The initialization is random, so your results are likely to differ.

12.7 Calibration Plots

Figure 12.9 data.

We can plot these and smooth them by

A smoothing method with an adaptive bandwidth such as is needed here,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.