0% found this document useful (0 votes)
53 views

Graybill Intro

The document discusses mathematical concepts related to matrices including definitions of matrices, vectors, transpose, inverse, rank, and theorems related to properties of matrix operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Graybill Intro

The document discusses mathematical concepts related to matrices including definitions of matrices, vectors, transpose, inverse, rank, and theorems related to properties of matrix operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Mathematical Concepts

1.1 Matrices

The theory of linear statistical models that will be developed in this


book will require some of the mathematical techniques of calculus,
matrix algebra, etc. In this chapter we shall give some of the impor
tant mathematical theorems that are necessary to develop this theory.
Most of the theorems in this chapter will be given without proof;
for some, however, the proof will be supplied.
A matrix A will have elements denoted by ag, where i refers to the
j
row and to the column. If A denotes a matrix, then A' will denote
the transpose of A, and A-1 will denote the inverse of A. The symbol
|A| will be used to denote the determinant of A. The identity matrix
will be denoted by I, and 0 will denote the null matrix, i.e., a matrix
whose elements are all zeros. The dimension of a matrix is the number
of its rows by the number of its columns. For example, a matrix A of
dimension m × m, or an m × m matrix A, will be a matrix A with m
rows and m columns. If m = 1, the matrix will be called an m × 1
vector. The rank of the matrix A will sometimes be denoted by p(A).
Given the matrices A = (a,) and B = (bº), the product AB = C =

(cº) is defined as the matrix C with pgth element equal to X a,b. For
8= 1

AB to be defined, the number of columns in A must equal the number


of rows in B. For A + B to be defined, A and B must have the same
dimension; A + B = C gives cº -
ag + bg. If k is a scalar and
A is a matrix, then kA means the matrix such that each element is the
corresponding element of A multiplied by k.
I
2 LINEAR STATISTICAL MODELS

A diagonal matrix D is defined as a square matrix whose off-diagonal

is,
all
zero; that (dº), then dº

j.
=

+
-
elements are

D
if

if
0

i
A; that

is,
A'

A.
Theorem The transpose equals (A')'

of

=
1.1

A; that

is,
Theorem A.

of

=
1.2 The inverse A-1 (A-1)-1

is

The transpose and inverse symbols may per

be
Theorem 1.3

is,
muted; that

=
(A') (A-1)'.

-1
Theorem (AB) B'A'.

=
1.4

Theorem (AB)-1 B-4A-1 are each nonsingular.

A
1.5 and

if

B

is,
Theorem 1.6 A scalar commutes with every matrix; that

kA = Ak.
Theorem For any matrix IA AI A.

=
=
A
1.7 we have

Theorem All diagonal matrices

of
1.8 the same dimension are

commutative.

Theorem D1 and Ds are diagonal matrices, then the product


If

1.9

is,

D,
diagonal; that diagonal.
D =

=
DiD2 D2D1 where
is

D
is
The ith diagonal element the product the ith diagonal
of

of
is

D1 and the ith diagonal element


of

if of
element Ds.

Theorem nonsingular
If

are vectors and


X

A
1.10 and

X is
Y

= a
the equation AX holds, then A-Y.
=

matrix and
Y
if

the product AB
of

Theorem
A of
The rank the two matrices

1.11
equal
B. or

or
to

of

less than the rank and than


A

and
is
to B

is less
equal
of

the rank

Theorem equal
of

of

The rank or
+
A

1.12 the sum less than


B
is

to

B.

plus the rank


of

of

the rank
If A

Theorem |A|
0,

matrix and then the


=
A

1.13 an
if
is

x

n
of

of

rank less than (|A| denotes the determinant


m.
A

the
is

matrix A.)
Theorem
of

the rank less than then the rows of


n,
If

1.14
A
is

independent; likewise, inde


of

are not the columns are not


A

pendent. (A
is

×
m

m.)

Theorem then the number of


m,
is of
If

the rank
s
A

1.15
m
is

linearly independent rows linearly


of

m; also, the number


independent columns (A
m.
is

is
m

m.)
×

Theorem A'
0,
If

0.
=

=
A

1.16 then
A

Theorem unaltered by multiplication


of

1.17 The rank matrix


A, is

B,
is,

by nonsingular matrix; that and are matrices such


C
if
a
MATHEMATICAL CONCEPTS 3

thatAB and BC exist and if A and C are nonsingular, then


p(AB) = p(BC) = p(B).
• Theorem 1.18 If the product AB of two square matrices is 0,
then A = 0 or B = 0 or A and B are both singular.
• Theorem 1.19 If A
and B are m × m matrices of rank r and s,
respectively, then the rank of AB is greater than or equal to
r —H 8 — m.

• Theorem 1.20 The rank of AA’ equals the rank of A'A equals the
rank of A equals the rank of A'. Z 3
- * - X
2 x 1%x. 3, ...}
3%,
& X2
1.2 Quadratic Forms 3. 3,

If Y is an n xvector with ith element y, and if A is an n x n matrix


1

with ijth element equal to au, then the quadratic form Y'AY is defined as
º º
X jX yºy,ag. The rank of the quadratic form Y'AY is defined as the
i =1 = 1
rank of the matrix A. The quadratic form Y'AY is said to be positive
definite if and only if Y'AY > 0 for all vectors Y where Y # 0. A
quadratic form Y'AY is said to be positive semidefinite if and only if
Y'AY > 0 for all Y, and Y'AY = 0 for some vector Y # 0. The
matrix A of a quadratic form Y'AY is said to be positive definite
(semidefinite) when the quadratic form is positive definite (semi
definite). If C is an n x n matrix such that C'C = I, then C is said
to be an orthogonal matrix, and C" = C-".
Consider the transformation from the vector Z to the vector Y by
the matrix P such that Y = PZ. Then, Y'AY = (PZ)'A(PZ) =
Z'P'APZ. Thus, by the transformation Y = PZ, the quadratic
form Y'AY is transformed into the quadratic form Z'(PAP)Z.
• Theorem 1.21 If P is a nonsingular matrix and if A is positive
definite (semidefinite), then P'AP is positive definite (semidefinite).

• Theorem 1.22 A necessary


and sufficient condition for the
symmetric matrix A to be positive definite is that there exist a
nonsingular matrix P such that A = PP'.
• Theorem 1.23 A necessary and sufficient condition that the
matrix A be positive definite, where

(111 (112 ' ' ' 01n

(121 (122 ' ' ' (12n

- - - - - - - - - - -
4 LINEAR STATISTICAL MODELS

is that the following inequalities hold:


(111 012 * * * 0.1m

(111 (112 0.21 (122 * * * 0.2n.

a11 > 0, > 0, . . . . . . . . . . . . . . . . > 0


(121 (122

- - -
(ºn 1 (ºn 2 Cºnn

• Theorem 1.24 If A is an m × m matrix of rank m > m, then


A'A is positive definite and AA’ is positive semidefinite.
• Theorem 1.25 If A is an m × m matrix of rank k < m and
k < n, then A' A and AA’ are each positive semidefinite.
• Theorem 1.26 If C is an orthogonal matrix, and if the trans
formation Y = CZ is made on Y'Y, we get Y'Y = Y(IY =
Z'C'ICZ = Z'C'CZ = Z'Z.
In order to develop the theory of quadratic forms, it is necessary to
define a characteristic root of a matrix. A characteristic root of a
p X p matrix A is a scalar A such that AX = AX for some vector
X # 0. The vector X is called the characteristic vector of the matrix A. .

It follows that, if A is a characteristic root of A, then AX AX = 0 –


and (A –
AI) X = 0. Thus, A is a scalar such that the above homo
geneous set of equations has a nontrivial solution, i.e., a solution other
than X = 0. It is known from elementary matrix theory that this
implies |A –
AI] = 0. Thus the characteristic root of a matrix A
could be defined as a scalar A such that |A AI] = 0. –
It is easily
seen that |A –
AI] is a pth degree polynomial in A. This polynomial
is called the characteristic polymomial, and its roots are the characteristic
roots of the matrix A. We shall now give a few theorems concerning
characteristic roots, characteristic vectors, and characteristic poly
nomials. In this book the elements of a matrix will be real.
• Theorem 1.27 The number of nonzero characteristic roots of a
matrix A is equal to the rank of A.
• Theorem 1.28 The characteristic roots of A are identical with the
characteristic roots of CAC-". If C is an orthogonal matrix, it
follows that A and CAC' have identical characteristic roots.

• Theorem 1.29 The characteristic roots of a symmetric matrix


are real; i.e., if A = A', the characteristic polynomial of |A AI] –
= 0 has all real roots.

• Theorem 1.30 The characteristic roots of a positive definite


matrix A are positive; the characteristic roots of a positive
semidefinite matrix are nonnegative.
MATHEMATICAL CONCEPTS 5

• Theorem 1.31 For every symmetric matrix A there exists an


orthogonal matrix C such that C’AC = D, where D is a diagonal
matrix whose diagonal elements are the characteristic roots of A.

• Theorem 1.32 Let A1, A2, . . . , A, be a collection of symmetric


m × m matrices. A necessary and sufficient condition that there
exist an orthogonal transformation C such that C’A1C, C'A2C,
. . . , C'A.C are all diagonal is that A, A, be symmetric for all

i and j. Since all the A, are symmetric, it follows that AA, is


symmetric if and only if A, and A, commute.
• Theorem 1.33 Let an m × m matrix C be written

where c, is the ith row of C. Thus c, is the transpose of an m × 1


column vector. The following two conditions are necessary and
sufficient for C to be orthogonal:

(1) c,c' –0 for all i žj


(2) c,c' – 1 for all i
That is to say, any two rows of any orthogonal matrix are orthog
onal (their inner product is zero), and the inner product of any
row with itself is unity.
C1

C2
• Theorem 1.34 Let C1 = be p rows of an n x n orthogonal
p Xm - - -

Cp

matrix. That is to say, let c1, C2, ...,


c, be the transposes of p
vectors such that c,c' 0 (i = 1, –
2, žj
p) and c,c' 1 ..., –
(i = 1, 2, . . . , p). Then there exist n p vectors f, such that –
-
cf, 0 for and and fif, ...,
all

p), f(f,
=
1,
2,

a
-
j

(i

0
n
1
i

Thus the theorem states that, we are given


C as j.
ž

matrix
if

if

× a
i

p)

C1, there exists


of

such matrix C2 dimension such


(n

a

that (C1 forms the first prows C, the last


of
=


p

and
C

n
)
|
2

C), where orthogonal.


of

rows
C
is
6 LINEAR STATISTICAL MODELS

1.3 Determinants

In this section a few of the important theorems on determinants


will be given. It will be assumed that the student knows the definition
of a determinant and knows how to evaluate small ones. In linear
hypothesis applications it is often necessary to solve systems involving
a great many equations. It might at times be necessary to evaluate
large determinants. There are many methods of doing these things
that are adaptable to automatic and semiautomatic computing
machines. These methods will be discussed in detail later. It will
be assumed here that the student knows how to evaluate determinants
by the method of minors or by some other simple method.

• Theorem 1.35 The determinant of a diagonal matrix is equal to


the product of the diagonal elements.
• Theorem 1.36 If A and B are n x n matrices, then |AB| =
|BA| = |A|B).
• Theorem 1.37 If A is singular, [A] = 0.

• Theorem 1.38 If C is an orthogonal matrix, then [C] = +1 or


|C| = — 1.

• Theorem 1.39 If C is an orthogonal matrix, then |C'ACI = |A|.


• Theorem 1.40 The determinant of a positive definite matrix is
positive.
• Theorem 1.41 The determinant of a triangular matrix is equal
to the product of the diagonal elements.
• Theorem 1.42 |D-I = 1/|D|, if |D| A 0.

• Theorem 1.43 If A is a square matrix such that

A=
[.
A21 …)
Age

where A11 and A2a are square matrices, and if A12 = 0 or A21 = 0,
then |A| = |A11||Assl.
• Theorem 1.44 If A1 and A2 are symmetric and A2 is positive defi
nite and if A1 – As is positive semidefinite (or positive definite),
then |A|| > |Ag|.

1.4 Miscellaneous Theorems on Matrices


In this section we shall discuss some miscellaneous theorems con
cerning matrices, which we shall use in later chapters.
MATHEMATICAL CONCEPTS 7

The trace of a matrix A, which will be written tr(A), is equal to the


7,
A; that

is,
sum of the diagonal elements of tr(A)

=
au.

X
=

1
i

.
Theorem 1.45 tr(AB) tr(BA).

=

Proof: By definition, tr(AB) Xaub,i. By definition,

to
equal

is
*]
tr(BA) equal But clear that Xbºast;

-
to
agby,

= X

ijX
bºast.

is
it
is
ik ºk
therefore, tr(AB) tr(BA).
tr(BCA); that

is,
Theorem tr(ABC) tr(CAB)

=
1.46 the

the product invariant under any cyclic


of

of
trace matrices

is
permutation
of

the matrices.
By Theorem trC(AB)].
Proof: tr. (AB)C]
1.45,

=
Theorem tr(I) identity matrix.
n,
=

1.47 where an

is

x

n
Theorem 1.48 I
an orthogonal matrix, tr(C’AC) tr(A).
If

==
is
C
*

Proof: By Theorem tr(C’AC) tr(CC'A) tr(IA)

=
1.46, tr(A).
=

break matrix up into submatrices.


to

sometimes advantageous
It
is is

This called partitioning matrix into submatrices, and matrix can


a

be a
partitioned many ways. For example, might partitioned
in

A
be
into submatrices as follows:

A =
[.
...)

A21 A22
m,

n1, A12 ng, A21 n1,


A

where A11
is
m

m1 m1 me
is is

is

is
×

×
×

m2
=

and A22 ng, and where


n.
+

and
m

me m2
X

n
1

The product AB two matrices can be made symbolically even


of

and are broken into submatrices. The multiplication proceeds


as if

B
if A

the submatrices were single elements However,


of

the matrix.
the dimensions of the matrices and of the submatrices must be such
that they will multiply. For example, matrix such
p

an
if
B
is

×
m

that

=
|.
...)
B

B21 Bes

B, an
p,

matrix, then the product AB exists; and the


n,

where
is

corresponding submatrices will multiply, since Ag


of

dimension
is

and B, The resulting matrix


n;
n,

as

m;
of

dimension pe.
is

is
x

follows:

-
ſº
+
+

A12B21 A11B12

...)
ſº

AB =
|.
...)

...)

A21 Age B31 B22 A21B11 + Asabel A21B12 + A22B22


-
8 LINEAR STATISTICAL MODELS

• Theorem 1.49 If A is a positive definite symmetric matrix such


that

A =
[.

...)
Aal Age

the inverse of such that

A
and
if
B
is
=

...)
B

|.
Bai Bes

B, and At dimension m, m, etc.,

of
and are each
if

X
BigB.


Alſº B11 Bel

Proof: B-1, then AB Thus,


=

=
A

Since

I.
F
[.
|.
...)

...)
I
A21 A22 B21 B22

and we get the following two matrix equations:

and
+

=
A11B11 A12 B21 A11B12 A12B22

0
I

Solving the second equation for A12, we get A12 −A11B12B.'.

=
Substituting this value for A12 into the first equation, we get

A11B11
- AuBigB. Bel =
I

Multiplying by Ali" gives the desired result.


known that B." and Ali' exist, since are positive
It

and
B
is

principal
of
definite matrices and since A11 and B22 are minors

A
B,

and respectively. By Theorem 1.23, the determinant the


of
principal minor positive definite matrix positive. Similar
of

is
a

equations can derived for Age', Bū', and B.'.


be

be

Theorem Let the square matrix such that


A

1.50

=
A

...)
ſº

A21 Age

nonsingular, then |A| A12Agº'Agil.


If

Ass |Ass||A11

be =
is

Proof:
of

The determinant written follows:


as

can
A

|Ass||A||B
=

|A|
0

-
I

where =
B

—1 —1
Ag: A21 Ag:
MATHEMATICAL CONCEPTS 9

This is clear since, by Theorem 1.43,

|B| = |IA.: = |A|.


SO
|A| = |Ass||A||A;| = |A||A;||Ass] = |A|
Replacing A and B by their submatrices, we get

All I 0
- AºAal
1. A12
|A| = |A22.
21 Age As.

= | 22
A21 A22 -As. Ası A.
The corresponding submatrices are such that they multiply; so

|A| - |A22
Ali - A12A. As A12A;
22
F |A22 |A11
- A12A.
-
Aal
0

as was to be shown.

Consider the system of equations AX = Y, where A is an m × m


matrix, X is an m × 1 vector, and Y is an n x 1 vector. Writing
this linear system in detail, we get

a112.1 + a 1922 + · · · + aim cn = yi

+
-i-

V2
+

as12.1 aggrº demºcn


'
'
'

-
+ nºº + + ºn mºm =
-
-

n121 ſ/n
0

For a, and say, for given matrix


to

given set
y,
of

(that and
A
is
a

vector Y),
is,

vector X)
a,
of

does there exist set elements (that


a

such that the equations AX


=

are satisfied? Three cases must be


Y

considered:
The equations have no solution.
In

this case there exists no


1.

such that the system equations satisfied, and the system


of

vector
X

is

said to be inconsistent.
is

just one set


ac,

In

that satisfies the system. this case,


of

There
is
2.

unique
to

there said exist solution.


is

that satisfies the system.


If

There more than one vector


X
is
3.

more than one such vector exists, then an infinite number of vectors
X

exist that satisfy the system equations.


of

We shall consider two matrices: the coefficient matrix and the


A
10 LINEAR STATISTICAL MODELS

augmented matrix B = (A, Y), which is the matrix A with the vector
Y joined to it as the (m. -- 1)st column; that is to say,

011 (112 ' ' ' (11m V1

B - (121 (122 0.2m /2

0 n1 0 nº (tr. n $/n

We shall now state some important theorems concerning solutions


to the system of equations AX = Y.

• Theorem A necessary and sufficient condition that the


1.51
system of equations AX = Y be consistent (have at least one
vector X satisfying

of
it)
that the rank the coefficient matrix
of is
(A, Y).
be

to

equal the augmented matrix

=
the rank
A

B
Theorem p(A) p(B)

p,

of
If

the unknowns
=

p
1.52 then


m

assigned any desired value and the remaining


ac, ac,

be

of
the

p
can
will be uniquely determined.

of
essential that the
It

p

m
is
that are assigned given values be chosen such that
ar,

the unknown
the remaining
of

of

the matrix unknowns have

p
the coefficients
p.

rank

Theorem p(A) p(B) unique vector


n,
If

X
=

1.53 there
m
×

is

a
that satisfies AX = Y.

equations
of

As an example, consider the system


=
– –
a'i

6
2
a

2a1 23:2
3

put into matrix form


be

This
as

can

|||—||
|
|

can easily be verified that the rank the augmented


of

matrix
It

—l
|


|2
2

Therefore, the system equations


of

not consistent, and there


is

is
2.

that satisfy This fact also easily seen


are

it.

exist no values and


it is
1
a

we multiply the first equation by and subtract from the second


if

equation. We get —9, an impossible result.


=
0
MATHEMATICAL CONCEPTS 11

1.5 The Derivatives of Matrices and Vectors

We shall now discuss some theorems relating to the differentiation


of quadratic forms and bilinear forms. It will sometimes be advan
tageous in taking derivatives of quadratic and bilinear forms to be
able to take derivatives of matrices and vectors.
Let X be a p x 1 vector with elements ar, let A be a p x 1 vector
with elements a, and let Z = X'A = A'X (Z is a scalar). The deriv
ative of Z with respect to the vector X, which will be written 0Z/0X,
will mean the vector

• Theorem 1.54 If X, A, and Z are as defined above, then


0Z/0X = A.
Proof: To find the ith element of the vector OZ/0X, we find
*)
(3,

9
04
0x, 0x,
a,.

0Z/0X
so
a,

which equals
of

Thus the ith element


is

;
A.

02/0X
=

be

be

Theorem Let vector, vector, and


A

1.55
B
x

q
x
a
a

matrix whose ijth element equals ag. Let


X

be
p

q
X
a

p
q

a,a,n,b,
=

=
X
X

A^XB
Z

m
=

=
1

Then 9Z|0X AB'.


=

matrix whose ijth element


be

Proof: 62/6X will


X p

is
×
q
a

92/0a,. Assuming that not symmetric and that the elements


is

are independent,
of
X

{3,3,…) -
(I
*

=
m

m
=

=
1

a,b,
ôr, da,
12 LINEAR STATISTICAL MODELS

Thus the ijth element of 0Z/0X is a,b,. Therefore, it follows that

24 – AB
0X

• Theorem 1.56 Let X be a p x 1 vector, let A be a p X p sym


p p

metric matrix, and let Z = X'AX = X. Xa,a,a,; then 0Z/0A =


i = 1j = 1
2XX' — D(XX'), where D(XX") is a diagonal matrix whose
diagonal elements are the diagonal elements of XX'.
Proof: By 02/0A we shall mean a matrix whose ijth element is
0Z/ðag. Thus,
p p

24
"[3,3,…)
da'ij ôa,

If i= j, 0Z/0am – then 0Z/0a,


aft.

2x,z, (remember

j,
+
If

-
i
ing that a,
a,

Thus 02/0A 2XX'


D(XX").
=

=
).

be

Theorem Let

be
1.57 vector and let
X

Z p

p

X
x
a

a
1
symmetric matrix such that X'AX; then 0Z/0X 2AX.

=
=

Proof: The derivative with respect


of

the scalar the vector

to

X
Z

will mean vector whose ith element 02/0x.


p

is
x
a

2, p

0Z. 0(X'AX) *(X, --~~)


0x, 0x, ôr,

*(3,4-3,3,…)
p

p
2

m m
=

=
1

1
mn

m
#

or,
p

rain rain
=

2xas
--

=
i 1X

X
2

2
m

=
=
n

n;4
p

but 2AX 2,4,


=

X
2
=
n
1

1.6 Idempotent Matrices

We shall now prove some theorems concerning special type


of
a

matrix, the idempotent matrix. Since many elementary textbooks on


matrix algebra include few theorems on idempotent matrices and since
these theorems will play important part the theory follow,
so

to
in
a

we shall supply the proofs We shall make extensive


to

the theorems.
MATHEMATICAL CONCEPTS 13

use of idempotent matrices in our ensuing work. A square matrix A


is a symmetric idempotent matrix: if the following two conditions hold:
(1) A = A'
(2) A= A*

For brevitywe shall omit the word “symmetric.” That is to say,


when we say a matrix is idempotent, we shall mean symmetric
idempotent. We shall make no use whatsoever of idempotent
matrices that are not symmetric.

• Theorem 1.58 The characteristic roots of an idempotent matrix


are either zero or unity.
Proof: If A is idempotent and if A is a characteristic root of A,
there exists a vector X # 0 such that AX = AX. If we multiply
both sides by A, we get
A*X = AAX = 2*X

But A*X = AX = AX; so we have

AX = 2*X

(A* – A)x = 0

But X # 0; so 2° — A must be zero. Thus A = 0 or Å = 1.

• Theorem 1.59 If A is idempotent and nonsingular, then A = I.


Proof: AA = A. Multiply both sides by A-1.

• Theorem 1.60 If A is idempotent of rank r, there exists an orthog


onal matrix P such that P'AP = E, where E, is a diagonal
matrix with r diagonal elements equal to unity and the remaining
diagonal elements equal to zero.
Proof: This follows immediately from Theorem 1.31.
• Theorem 1.61 All idempotent matrices not of full rank are
positive semidefinite.
Proof: Since A = A'A, the result follows from Theorem 1.25.
This theorem permits us to state that no idempotent matrix
can have negative elements on its diagonal.

• Theorem 1.62 If A is idempotent with elements a, and if the


ith diagonal element of A is zero, then the elements of the ith
row and the ith column of A are all identically zero.
Proof: Since A = A*, we get for the ith diagonal element of A
77

jX
att = (li;(13;
=1
14 LINEAR STATISTICAL MODELS

But at, - agi, so

-
a; t =
n

X a;;
2

3=1

then at = 0 (for j = 1, 2, ...,

so is,
But if a., H 0, n); that the
But A';

of
the ith row are all zero.

=
A
elements the
elements of the ith column are also all zero.

Theorem 1.63 idempotent then tr(A)

of
rank

If

=
A

r,
is

r.
Proof: By Theorem orthogonal matrix

an
1.31, there exists

P
PAP E. tr(PAP) tr(A); thus tr(A)

=
such that But

=
tr(PAP) tr(E)
=

=
r.
Theorem an idempotent matrix and idem
If
A
1.64 is an

B
is

potent matrix, then AB idempotent AB BA.

=
if
is
Proof: AB BA, (AB)(AB) (AA)(BB) AB.
If

=
then

Theorem idempotent and orthogonal, PAP


If
A

1.65

is
P
is

is

idempotent.
Proof: (PAP)(PAP) (PA)(AP) PAP.
=

=
Theorem idempotent and idem
If

=
I,
+
A

B
1.66 A
is

is

potent and AB BA
0.
=

Proof: We shall first show that idempotent. Now

= =
B

B
is
A,

I B.

We get
=

and we must show that B” A)*

(I


I

A)(I A) IA AI A. Thus B? B.
=

A*

=

A=



(I

+
I

We must now show that AB BA We have


B =

=
0.

Multiply the right by and obtain AB


on


=

=+

=
or B

B
I.

AB multiply the quantity


If
=
A,0.

+
A

we on the
B

I
right by the result BA follows.
=
0

Theorem A1, A3, A, are idempotent matrices,


If

1.67
p

p
X

,
.
.
.

necessary and sufficient condition that there exist an orthogonal


a

matrix such that P'A1P, P'AgP, PA, ..., are each diagonal
P
P

AA, A,
A,

j.

that for all and


-
is

Proof: This theorem very special case


of

Theorem 1.32.
is
a

its importance, we have stated


as

separate theorem.
of

Because
it

Theorem A1, Ag, A, are symmetric matrices,


If

1.68
p

p
X

,
.
.
.

any two the following conditions imply the third:


of

(1) A1, A2, An are each idempotent.


,
.
.
.

The sum A, idempotent.


=

is
X

(2)
B

žj.
=
1
i

AA, for all


=

(3)
0

i
MATHEMATICAL CONCEPTS 15

Proof: Suppose (1) and (2) are given. Then B = s


i=1
A, is idem
potent, and there exists an orthogonal matrix P such that

I, 0
P'BP =
0 0

where we suppose that B is of rank r and I, is the r X r identity


matrix. Thus we have
I, 0 na
P’BP =
0 0
= X
i =1
PAP
But P’A.P are each idempotent, by Theorem 1.65. By Theorem
1.61, the last p — r diagonal elements of each P'A.P must be
zero. This followssince the diagonal elements of an idempotent
matrix are nonnegative; so, if their sum is zero, they must all be
identically zero. Also by Theorem 1.62, the last p r rows and –
p — r columns of each P’A.P must be zero. Thus we may write

B, 0
PAP =
0 0

So, using only the first r rows and first r columns of

P'BP = X
i= 1
PAP

we have I =X B,
i =1

where the B, are idempotent. Let us assume that the rank of


B, is Then there exists an orthogonal matrix such
re.

C
X
r

that
I,
0

C’B,C
=
0

I,

ºn.
0

Then C'IC = = C’B,C +


X.
I

Since C'B,C idempotent, by Theorems 1.61 and 1.62 we have


is

..., m;
0

C’B,C
#
2,
=
1,
=

t
t

K,
0
16 LINEAR STATISTICAL MODELS

where K is an (r — t) × (r — t) matrix. Thus we see that

C’B,CC'B,C = 0
which implies

B,B, - 0 and AA, = 0 i = 1, 2, . . . , m; i # l

Since t was arbitrary, the proof of condition (3) is complete.


Now suppose (1) and (3) are given. Then we have
nº 2 ºn ºn
= AA,

A*
A.)
B%
($

+
X

=
A

B
X

-X
#j
=

=
1

1
i

i
i
We have shown that the sum idempotent, and condition (2)

is
satisfied.
is

Finally, (2) and (3) are given.


suppose By Theorem 1.67,
there exists an orthogonal matrix such that P'A1P, PAgP,

P
PA,

A,
are each diagonal (since A, A;

=
0). Since

A
=
P
,
.
.
.

diagonal matrices diagonal matrix,


of

the sum also follows

it
is
a
that P'BP By condition

3,
diagonal. we know that
= is

P’A,PPA, product žj.


two diagonal

of
for all But the
P

matrices P'A.P and P'A,P zero and only the corresponding

if
ż of of is

if

nonzero diagonal elements P'A, are zero P'A,P. Thus,

in
P

the tth diagonal element P'A,P nonzero, the tth diagonal


if

k or be is

P'A,P (for E,
all

Since P'BP
of

=
element must zero.
j
i)

the tth diagonal element must for each PA, For,


be

P.
0

j) # I

P'A,P equal then the tth diagonal


is 0,
of

is (i to

the tth element


if

is

the remaining P'A,P


of

element zero. But the tth


diagonal element the tth diagonal
of
of

or

and the sum


(i B
is

...,
0

2, 1

P'A.P.
of

m). Thus =
A,or
1,

elements
1.
k

k
0

Since P'A,P diagonal, the characteristic roots are dis


of
is

played down the diagonal, and, since these roots are either
A,

idempotent, and the proof complete.


or
1,

is
is
0

special interest note that,


of
It

to
is

if

A =
I
X
º
1

this situation,
In
of

then condition (2) Theorem 1.68 satisfied.


is

condition (1) implies condition (3) and vice versa.

Theorem 1.69 any two


of

of
If

the three conditions Theorem


1.68 hold, the rank of A, equals the sum


of

of

the ranks the At.


s=
1
i
MATHEMATICAL CONCEPTS 17

Proof: If any two conditions of Theorem 1.68 hold, this implies


that there exists an orthogonal matrix P such that the following
are true:

, 0
(a) P’BP = where the rank of B is r
0 0
0 0 0
I,
(b) PA, P = | "
0
, ); P'AgP = | 0 I, 2 0 |; . . . ;
0 0
0 0 0

0 0 0

PA, P = | 0 I, 0
0 0 0

A
rs.

where the rank of = Thus the result follows.

*1.7
e
Maxima, -
Minima, -
and Jacobians
v

-

20

We shall now state some theorems concerning the maxima and


minima of functions.

Theorem f(r,x, ...,x,) of


If

function variables
=

1.70°.
is
y

n
a

and all partial derivatives dy/0a, are continuous, then attains


if

y
its maxima and minima only the points where
at
*

...
*

Pu



o

dri Óa, 0x,


f(x1, x2, ...,a,)
In

particular, quadratic form, then


if

is
a

f(x1, x2, ...,a,)


of

continuous and has continuous derivatives all


is

orders;
so

the theorem applies.

f(x1,x2, ...,x,)
all

Theorem
If

1.71. such that the first and


is

second partial derivatives are continuous, then the point where


at
...

21–0
*
=


2

0x1 0x, 0x,


the function has
K,

minimum, ijth
of

(1) the matrix where the element


K
is
if
a

ôºf|0a,0x, positive definite.


is

(2) maximum, the matrix –K positive definite.


if

is


a

In the above two theorems on maxima and minima, must be


it

are independent variables. Many times


ac,

remembered that the


is
it

18 LINEAR STATISTICAL MODELS

desired to maximize or minimize a function f(x1, x2, ...,a,) where the


independent subject

ac,

to
are not but are constraints. For example,
suppose necessary minimize the function f(x1, x2, ...,a,)

to
it
is
subject the condition h(x1, x2, ...,a,)

ac,
=
Since the are not

to

0.
independent, Theorems 1.70 and 1.71 will not necessarily give the
the equation h(x1, x2, ...,a,)

be
=
solved

If
desired result. could

0
ac,

for such that

=
h(~1.2.2, .a.)

.
.
.
substituted into f(x1, x2, ,a,) and

ac,

be
of
then this value could

.
.
.
applied.

be
Theorems 1.70 and 1.71 could

to
As an example, suppose we want

of
find the minimum


=
f
a
Using Theorem 1.70, we get
x;
+

+

2a1 622 16.

-2
2,
#
=

=
0
0x1

2,
-6–0
#
=

ôre

The solutions yield


a2
=
1,

3.”
1
a

The matrix given by


K
is

of

of
\"

-
=
ôr, or, 0x,
- +-O

ºr
K

=
|
àºf
| 0

Öºf
6a, 6a, 0.r.

3.
positive definite; the point
so

at

x,

r2
=
minimum
1,
-
K

has
If is

f
a

subject
to

to
of

we wantfind the minimum the condition


f
æ2

as as

we proceed
22
aci

follows. Substitute into


1,


+

=
1

1
a

the fifunction, and proceed before. This gives


2(1

6*, +16
re)

x;

* –


=

+

re)”
(1
f

-6
of

–2(1 22, x,-6


=
=

+
+

=
H
0
2

ôre
are

of
= In

The solution gives


So =

this case the matrix


}.

consists
K

f. a

single positive definite.


4"

term 0°f|0x3:#; and Thus


K

is

subject
at
æs
aci

the constraint attains minimum the


1,
= to

point -$,
=
#.
a *
1

2
a

complicated function there are many


or
If

the constraint
is

if

constraining equations, this method may become cumbersome. An


Lagrange multipliers. For example,
of

alternative
if

the method
is
-*
MATHEMATICAL CONCEPTS 19

we want to minimize f(a 1,22, ...,a,) subject to the constraint


h(x1, x2, ...,a,) = 0, we form the equation

F = f(r1,r2, . . . .an) – Wh(zi,ze, . . . .a.)

The aci, a 2, ...,a,n, A can now be considered n + 1 independent vari


ables. We now state the theorem:

• Theorem 1.72 If f(x1, x2, ...,a,) and the constraint h(x1, x2, ...,
a,) = 0 are such that all first partial derivatives are continuous,
then the maximum or minimum of f(x1, x2, ...,a,) subject to
the constraint h(x1,2,2,...,a,) = 0 can occur only at a point
where the derivatives of F = f(x1, x2, ...,a,) Wh(x1,22, ...,a,) –
vanish; i.e., where

6F OF 9F 9F
0x1 02, 0a:

if Öh/6x, # 0 for all i at the point.

Thus we now have n + 1 equations and n + 1 unknowns, and we


need not worry about which variables are independent, for we treat all
n + 1 as if theywere independent variables.
This will be generalized in the following.

• Theorem 1.73 To find the maximum or minimum of the function


f(x1,2,2,...,a,) subject to the k constraints h;(x1,22, ...,a,n) = 0
(i = 1, 2, ...,
k), form the function F = f(a 1,2,2,...,a,n) —
k
X A,h,(a 1,22, ...,a,). If 0F/0a, (i = 1, 2, ..., n) are continuous,
i =1
then f, subject to the constraints, can have its maxima and minima
only at the points where the following equations are satisfied (if
a Jacobian |0h/6x, # 0 at the point):

... ?" ...


05'

01"

?"

31
0

6a-1 dra 6a, 0A, 6%, 0%;

Let ...,a,), ..., n), repre


2,

g(x1, x2,
a.,
o

=
1,

Co
&
3

where
(i

frequency-density function. This equivalent the following


to

sent
is
a

two conditions on the function g(x1, x2, ...,a,n):

(1)
>

(r1,r2, .ºn)
9

0
.
.
.

oo oo OO
x2,

(2)

g(x1, .ºn) dai


-
-
-

da's dan
1
ſ
ſ
ſ

'
.
.
.

'
'
ºv

—oo —co -—oo


20’ LINEAR STATISTICAL MODELS

If we make h,(y1, y2, ...,ym), where

ar,

2,
=
the transformation

1,
i
the frequency-density function

n,

of
terms the new variables

in
,
.
.
.
given by

y,
v1, y2,

is
,
.
.
.
ya) g(hisha, .h,) |J|

=
k(V1,V2,...

.
.

.
.
.
(we shall assume certain regularity conditions on the transformation

-
equations). The symbol |J| denotes the absolute value

of
the Jacobian
the transformation. The Jacobian
of

of
the determinant matrix

K
is

a
whose jth element 0xi|0y,.

is
For example, if
e-ri-r;
*1
f(r1,r2)

a.,
<
3

o
00;

ari
=

3
3

od
—oo
T
frequency-density function and find the corre

to
we want
is

if
a

sponding frequency-density function k(y1, y2), where

y2
are arl
= =

+
4!/1

ye
2/1 –
dri dri
ôy, dy
4

1
1

K=
*|

we have -6

-2
7

=
H
-
–1
\2

dre drº
I

ôy, dy,
J|
=

−6 and
=

So we have
6.
J

-
(*-*.
& 6

k(y,y)
=

e-tºwitva)"
T
Thus quite clear that the Jacobian will play an important part
it
is

theory probability
in

of

the distributions.
*

Theorem given by
ar,
= of
If

transformations
-
1.74. set
is

...,y,),
n,

h,(y1, y2, where and the Jacobian


is
1,
2.

J
,
i

.
.
.

whose ijth element da'ſ dy,


of

the determinant the matrix


K

is

and the equations satisfy mild regularity conditions, and the


if

yield
y,

solutions for the


...,
2,

d.(r1,r2,
=

.”.)
w;

1,
=

n.
i
.
.
.
(if

given by 1||K*|, where


be
#

then the Jacobian can also


0)
J
J

the ijth element K* dy/0,,.


of

is

This theorem might the equations were such that the


be

useful
if

02/0y, obtain but the Öyſør, were relatively easy


to

to

were difficult
obtain.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy