Divergence Measures Based On The Shannon Entropy: Member

IEEE TRANSACTIONS ON INFORMATION THEORY. VOL. 37, NO.
I , JANUARY 1991 145
Divergence Measures Based on the Shannon Entropy context. Rao’s objective was to obtain different measures of
diversity [24] and the Jensen difference can be defined in terms
Jianhua Lin, Member, IEEE of information measures other than the Shannon entropy func-
tion. No specific detailed discussion was provided for the Jensen
difference based on the Shannon entropy.
Abstract -A new class of information-theoretic divergence measures
based on the Shannon entropy is introduced. Unlike the well-known 11. THE KULLBACKI AND J DIVERGENCE
MEASURES
Kullback divergences, the new measures do not require the condition of
absolute continuity to be satisfied by the probability distributions in- Let X be a discrete random variable and let p I and p 2 be
volved. More importantly, their close relationship with the variational two probability distributions of X . The I directed divergence
distance and the probability of misclassification error are established in [17], [19] is defined as
terms of bounds. These bounds are crucial in many applications of
divergence measures. The new measures are also well characterized by
the properties of nonnegativity, finiteness, semiboundedness, and
boundedness.
Index Terms-Divergence, dissimilarity measure, discrimination in- The logarithmic base 2 is used throughout this correspondence
formation, entropy, probability of error bounds. unless otherwise stated. It is well known that Z ( p I , p 2 )is non-
negative, additive but not symmetric [12], [17]. To obtain a
symmetric measure, one can define
I. INTRODUCTION
Many information-theoretic divergence measures between two
probability distributions have been introduced and extensively
studied [2], [7], [12], [15], [17], [19], [20], [30]. The applications of
these measures can be found in the analysis of contingency
tables [lo], in approximation of probability distributions [6], [16],
which is called the J divergence [22]. Clearly, I and J diver-
[21], in signal processing [13], [14], and in pattern recognition
gences share most of their properties.
[3]-[5]. Among the proposed measures, one of the best known is It should be noted that I ( p l , p 2 )is undefined if p 2 ( x ) = 0
the I directed divergence [17], [19] or its symmetrized measure,
and p , ( x )# 0 for any x E X . This means that distribution p I
the J divergence. Although the I and J measures have many
has to be absolutely continuous [17] with respect to distribution
useful properties, they require that the probability distributions
p 2 for Z(pl,p2)to be defined. Similarly, J ( p 1 , p 2 requires
) that
involved satisfy the condition of absolute continuity [17]. Also,
p I and p r be absolutely continuous with respect to each other.
there are certain bounds that neither I nor J can provide for the
This is one of the problems with these divergence measures.
variational distance and the Bayes probability of error [28], [31].
Effort [18], [27], [28] has been devoted to finding the relation-
Such bounds are useful in many decisionmaking applications [3],
ship (in terms of bounds) between the I directed divergence and
151, [111, [141, [311. the variational distance. The variational distance between two
In this correspondence, we introduce a new directed diver- probability distributions is defined as
gence that overcomes the previous difficulties. We will show
that this new measure preserves most of the desirable properties (2.3)
of I and is in fact closely related to 1. Both the lower and upper xrx
bounds of the new divergence will also be established in terms
of the variational distance. A symmetric form of the new di- which is a distance measure satisfying the metric properties.
rected divergence can be defined in a similar way as J , defined Several lower bounds for I ( p l , p r ) in terms of V ( p , , p , ) have
in terms of I. The behavior of I , J and the new divergences will been found, among which the sharpest known is given by
be compared.
Based on Jensen’s inequality and the Shannon entropy, an
extension of the new measure, the Jensen-Shannon divergence, where
is derived. One of the salient features of the Jensen-Shannon
divergence is that we can assign a different weight to each
probability distribution. This makes it particularly suitable for
the study of decision problems where the weights could be the
prior probabilities. In fact, it provides both the lower and upper
bounds for the Bayes probability of misclassification error.
Most measures of difference are designed for two probability established by Vajda [28] and
distributions. For certain applications such as in the study of
taxonomy in biology and genetics [24], [25], one is required to
measure the overall difference of more than two distributions.
The Jensen-Shannon divergence can be generalized to provide
such a measure for any finite number of distributions. This is
also useful in multiclass decisionmaking. In fact, the bounds derived by Toussaint [27].
provided by the Jensen-Shannon divergence for the two-class However, no general upper bound exists for either l ( p l I p ? )
case can be extended to the general case. or J ( p I , p Z )in terms of the variational distance [28]. Thls IS
The generalized Jensen-Shannon divergence is related to the another difficulty in using the I directed divergence as a mea-
Jensen difference proposed by Rao [23], [24] in a different sure of difference between probability distributions [161, [311.
Manuscript received October 24, 1989; revised April 20, 1990. 111. A NEWDIRECTED
DIVERGENCE
MEASURE
The author is with the Department of Computer Science, Brandeis
University. Waltham, MA, 02254. In an attempt to overcome the problems of I and J diver-
IEEE Log Number 9038865. gences, we define a new directed divergence between two distri-
0018-9448/9 I /0 100.0 145$0I .00 0 1991 IEEE

146 IEEE TRANSACTIONSO N INFORMATION THEORY, VOL. 37, NO. I , JANUARY IYCJ1
0.0 0.5 1.0
Fig. 1. Comparison of I , J , K , and L divergence measures.
butions p I and p 2 as The L divergence is related to the J divergence in the same way
as K is related to I . From inequality (3.3), we can easily derive
the following relationship,
1
L(Pl,P,)~-J(Pl,Pd. (3.5)
2
This measure turns out to have numerous desirable proper-
ties. It is also closely related to I . From the Shannon inequality A graphical comparison of I , J, K, and L divergences is
[ l , p. 371, we know that, K ( p l , , p 2 ) 2 0 a n d K ( p , , p , ) = O i f a n d shown in Fig. 1 in which we assume p I = ( t , l - t ) and p z =
oniy i f p , = p z , which is essential for a measure of difference. It (1 - t , t ) , 0 I t I 1. I and J have a steeper slope than K and L.
is clear that K ( p , , p , ) is well defined and independent of the It is important to note that I and J approach infinity when t
values of p , ( x ) and p 2 ( x ) ,x E X . approaches 0 or 1. In contrast, K and L are well defined in the
From both the definitions of K and I, it is easy to see that entire range 0 I t I 1.
K ( p , , p , ) can be described in terms of I ( p l , p 2 ) :
Theorem 2: The following lower bound holds for the K di-
rected divergence:
The following relationship can also be established between I

and K .
Theorem I : The K directed divergence is bounded by the I where L , and L , are defined by (2.5) and (2.6), respectively.
divergence: Proof From equality (3.2) and inequality (2.4), we have
1
K(P,>P,) p ” P ’ > . (3.3)
Proof: Since p , ( x ) > 0 and p 2 ( x ) 20 for any x E X , by the

inequality of the arithmetic and geometric means, we have
Since
2
Thus, it follows
c p , ( x ) l o g &”
\EX
Pl(X> (3.6) follows immediately.
In contrast to situations for the I and J divergences, upper
0
1 1
PdX) bounds also exist for the L divergence in terms of the varia-
=- pl(x)log-=~I(pl,p2). 0 tional distance.
\EX P dx1
Theorem 3: The variational distance and the L divergence
K ( ~ , , ~ is, ) obviously not a symmetric measure, we can
measure satisfy thc inequality:
define a symmetric divergence based on K as:
L(Pl,P,) = K ( P l , P , ) + K(P,,Pl). (3.4) L(Pl,P,) I V(Pl,P,). (3.7)
~
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO, I , JANUARY 1991 147
K ( p , , p , ) follows directly from its definition (3.1):
From the equality given in (3.8), we have
= 2 H ( F ) - H(p,)-H(p,), (3.14)
where H is the Shannon entropy function. Equation (3.14)

provides one possible physical interpretation of L ( p , , p,). This
entropic description also leads to a natural generalization of the
L divergence.
It has been proved in [8, p. 5211 that, for any 0 < a < 1, The K divergence coincides with the f-divergence for f ( x ) =
x log(2x/(l+ x)). The f-divergence is a family of measures
H(a,l- U ) 2 2min(a,l- U).
introduced by Csiszdr [7] and its many properties were studied
Since in [29], [30]. Additional properties of the K divergence can thus
be derived from the results for the f-divergence.
1
m i n ( a , 1- a ) = -(I - la - ( I - a)[),
2 Iv. Tiit J ~ N S E -SHANNON
N D l v t R G t N C t MEA SUR^
it follows that Let n-,,r2 2 0, x , + x , = 1, be the weights of the two proba-

bility distributions p I and p z , respectively. The generalization
1- H ( a , l - a ) < [ a - ( l - a ) [ . of the L divergence is defined as
which can be termed the Jensen-Shannon divergence. Since H

is a concave function, according to Jensen’s inequality,
JS,(p,,p,) is nonnegative and equal to zero when p , = p ? . One
of the major features of the Jensen-Shannon divergence is that
Since K ( p , , p , ) is clearly not greater than L ( p , , p , ) , from we can assign different weights to the distributions involved
Theorem 3 we immediately obtain the following bound for the according to their importance. This is particularly useful in the
K divergence: study of decision problems. In fact, we will show that
K(P,,P2) 5 V ( P , > P Z ) . (3.10) Jensen-Shannon divergence provides both the lower and upper
bounds to the Bayes probability of error.
Thus, the variational distance serves as an upper bound to both Let us consider a classification problem of two classes C =
the K and L divergences. ( c l r c 2 )with a priori probabilities p ( c , ) = n-,, p ( c 2 ) = n-, and let
The K and L divergences have several other desirable prop- the corresponding conditional probability distributions be
erties. As we mentioned earlier, both K and L are nonnegatil,e, p ( x l c , ) = pl(x), p(xlc,)= p , ( x ) . The Bayes probability of error
which is essential for being measures of difference. They are [ l l ] is given by
also finite and semibounded, that is,
Theorem 4: The following upper bound,

for all probability distributions p I and p 2 . This can easily be
seen from the definition of K or L and the Shannon inequality.
Another important property of the K and L divergences is
their boundednrss, namely, holds, where H ( n - , , a , ) = - n-, logn-, - x , l o g a 2 .
Proof: It has been shown in [ l l ] that
The second inequality can be easily derived from (3.9) and the 1
fact that the Shannon entropy is nonnegative and the sum of P,(P I , P . ) 5 -H(CIX), (4.4)
two probability distributions is equal to 2. The bound for 2
148 IEEE TKANSACTIONS ON INFORMATION THEORY, VOL. 37. NO. 1 , JANUARY 1991
0.5-
0.4-
0.3-
0.2-
0.0 0.5 1.0
Fig, 2. Shannon entropy and geometric mean.
where Proot By the definition of H ( C I X ) and the Cauchy in-

equality, we have
H(CIX) = p(x)H(Clx)
=-
xEx
P(X) P ( c l x ) ~ o g P ( c l x ) , (4.5)
H2(CIX)I ( X € X
P W ) - j c
X € X
P(X)H2(CIX))
X t X CEC
= p(x).H'(Clx). (4.11)
which is the equivocation or the conditional entropy [91. It is X t X
also known that
For any 0 5 t 5 I, it can be shown that
H ( C I X ) = H ( C ) +H ( X I C ) - H ( X ) . (4.6) 1
Since there are only two classes involved, we have -H(t,l- t ) IJ-, (4.12)
2
H ( C ) = H ( P ( C l ) , P ( C Z ) ) = H(Tl>TTT?), (4.7) holds as depicted in Fig. 2. A rigorous proof of this inequality is
and given in the Appendix (Theorem 8).
Therefore, inequality (4.1 1) can be rewritten as
H ( XlC) = p ( c , ) H ( Xlc,) + P ( C Z ) H ( X l c z )
= 7 7 1 H ( PI + T,H( P 2 ) . (4.8)
H2(CIX) 5 4. c P(cllx)P(c,lx))
X € X
Also observing that < 4.

- cx P ( x ) m i n ( P ( c , l x ) , P ( c , l x ) )
I€
P ( X > = T l P l ( x > +. r r , P , ( X ) ,
we have =4. min(.rlpl(x),T2p2(x)) = 4 . f ' , ( ~ , , ~ , ) .
5 t x
H ( X ) = H(TlPl+ " 2 P r ) . (4.9) From (4.6)-(4.9), inequality (4.10) follows immediately. 0
Combining (4.7), (4.8), and (4.9) into (4.6), we obtain from The Jensen-Shannon divergence J ( p , , p 2 ) was called the
inequality (4.4) that increment of the Shannon entropy in [32] and used to measure
1 the distance between random graphs. It was introduced as a
P,(PI>PTT?)' T ( H ( " ' . T 7 ) + T l H ( P I ) + T Z H ( P 2 ) criterion for the synthesis of random graphs. In the normaliza-
tion process, an upper bound had to be used. Based on com-
- H(TlPI + TTT?P2)) puter simulation, Wong and You [32] conjectured that the
1 increment of entropy cannot be greater than 1. This conjecture
( H ( 7-r 1 T 7 - JS,( P I P 2 ) ) .
=- 9 3
0 can be easily verified from inequality (4.3). Since the Bayes
2 probability of error is nonnegative, we have from (4.3) that,
The previous inequality is useful because it provides an upper J S , ( P , , P z ) 5 H ( T , , T 2 ) - 2 P c ( P l , P z ) IH(TI,TTT7) 5 1 .
bound for the Bayes probability of error. In contrast, no similar
bound exists in terms of either I or J divergence [31] although This further justifies the use of this measure in [32].
several lower bounds have been found [14], [26].
Theorem 5: The following lower bound also holds for the v. TtiE G E N E R A L I Z L D JENSEN
-SHANNON
DivtRC;ENCt M~ASURL
Bayes probability of error:
1
Most measures of difference, including the Jcnsen-Shannon
4 , ( ~ 1 , ~ 2 ) -2 ( N ( ~ , , T T T ~ ) - J S , ( P I , P ~ ) ) (4.10)
~. divergence previously discussed, are designed for two probability
4 distributions. For certain applications such as in the study of
~
IEEE TRANSACTIONS ON INFORMATION T H E O R Y , VOL. 37, N O . I . JANUARY 1991 149
taxonomy in biology and genetics [24], [25], it might be necessary VI. CONCLUSION
to measure the overall difference of more than two distribu-
tions. The Jcnsen-Shannon divergence can be generalized to Based on the Shannon entropy, we were able to give a unified
provide such a measure for any finite number of distributions. definition and characterization to a class of information-theo-
retic divergence measures. Some of these measures have ap-
This is useful for the study of decision problems with more than
peared earlier in various applications. But their use generally
two classes involved.
Let p I , p Z ; . . , p , / be n probability distributions with weights suffered from a lack of theoretical justification. The results
presented here not only fill this gap but provide a theoretical
rlr T?,’ . .,T , ~ respectively.
, The generalized Jenscn-Shannon
foundation for future applications of these measures. Some of
divergence can be defined as
the results such as those presented in the Appendix are related
ic
to entropy and are useful in their own right.
J ~ , ( P , ~ P Z > . . . ~=PH, / )
where n- = ( T ~ , T ~. , T. ,. ~ )Consider
.
i.c, n-,P, -
a decision problem with n

T , H ( P , ) , (5.1) The unified definition is also important for further study of
the measures. We are currently studying further properties of
the class. Some of their key applications are also under investi-
classes cI,cZ;. . , c , ~with prior probabilities n - , , ~ ~ : ~ The gation.
Bayesian error for n classes can be written as
ACKNOWLEDGMEN
I
P(e)= p ( x ) ( I -max(p(c,Ix),p(c,lx);..,p(c,,lx)). The author would like to thank Prof. S. K. M. Wong for his
.I€X
comments and suggestions on an earlier version of this corre-
(5.2) spondence. The author is also grateful to the referees, especially
The relationship between the generalized Jensen-Shannon di- to the one who pointed out the connection between the diver-
vergence and the previous Bayes probability of error is given by gence measures presented here and the f-divergence.
the following theorems.
APPENDIX
Theorem 6: Theorem 8: For any 0 I x 5 1,
where
Proof: Consider a continuous function f ( x ) in the closed
/I
interval [O, 11:
H ( n - ) = - ~ n - , l o g a ,and p , ( x ) = p ( x I c , ) , i = 1 , 2 ; . . , n .
/=I f ( x) = 24- + x log x + ( 1 - x) log (1 - x ) .
f ( x ) is twice differentiable in the open interval (0, l),
Proof The proof of this inequality is much the same as that
of (4.3). 0
Theorem 7:
Proof: From (4.11) and Theorem 9 in the Appendix, we where In is the natural logarithm. There are two different real
have solutions of the equation, f ” ( x ) = 0,
1-41-(1n2)~ 1 + 41 -(ln212
i
/I -I
H’(ClX)5 cx
IE
P(X) 2
,=I
c ~P(crlx)(1-P(C,lx)) XI=
2
and x 2 =
2
It can be easily shown that 0 < x, < 1/2 < x 2 < 1.
(5.5) From (A.31, it is clear that the function f ” ( x ) is continuous in
By the Cauchy inequality, (5.5) becomes ( 0 , l ) and the denominator of f ” ( x ) is nonnegative in [O, 11. Since
/I -I lim ( 2 4 m - 1 n 2 ) = -1n2,
.
I +o+
.Y E x f ” ( x , ) = 0, and there exists no x E (0, x , ) such that f ” ( x ) = 0, by
the continuity of f ” ( x ) , it follows, f ” ( x ) < 0 for 0 < x < x,,and
(5.6) thus the function f ( x ) is concave in ( O , x , ) .
Assume, without loss of generality, that the p ( c , l x ) have been For x = 1/2 E ( x , , x ? ) , we obtain
reordered in such a way that p ( c , , l x )is the largest. Then from
2 J m - 1 n 2 = 1 - 1 n 2 > 0,
H’(CIX) 4 4 P ( x ) ( 1 - max { P ( c,Ix)} ]( n - 1) which implies f”(1/2) > 0. Since f ” ( x , ) = f “ ( x 2 ) = 0 and there
.v t x exists no x E (x,,x ? ) such ihat f ” ( x ) = 0, we can conclude that
=4(n-I)P(e), f ” ( x ) > 0 for x , < x < x 2 . f ( x ) is therefore convex in ( x , , x z ) .
Similarly, from
we immediately obtain the desired result. O
lim ( 2 4 m - 1 1 1 2 ) = -1112,
It should be pointed out that the bounds previously presented A - 1 -
are in explicit forms and can be computed easily. Implicit lower it follows f “ ( x ) < 0 for x ? < x < 1. This means that the function
and upper bounds for the probability of error in terms of the f ( x ) is concave in (x2,1). In summary, the function f ( x ) is
f-divergence can be found in [3]. It should be useful to study the concave in both open intervals (0, x , ) and ( x ? , l), and convex in
relationship between these bounds but it will not be done in this ( x l , x 2 ) . ( x , , f ( x , ) ) and ( x 2 . f ( x 2 ) ) are the points of inflections
correspondence. for f (x 1.
~
150 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37. NO. I . JANUARY l Y 9 l
Since f ( x ) is continuous in [ x l r x 2 ]and convex in (xl,x2), it

has a unique minimum in [xI,x2]. The minimum is obtained at
x,,, = 1/2 and f(x,,,)=O. Thus, we have, for any x E [ x I , x 2 ] ,
f ( x ) 2 f(x,,,) = 0. (‘4.4)
Also, since f ( x ) is continuous in [O, x , ] and concave in (O,x,),
we have
f(x)>min(f(o),f(x,))>min
( f(0) f
(2)
- =0,
for x E [ O , X ~ ] . ( A S )
=
+ dqu
c Jm.
I1
,=I
-I
- I( 1 - q!l- I ) ’
Similarly, REFERENCES
[ l ] J. Aczel and Z . Daroczy, On Measures of Information and Their
Characterizations. New York: Academic, 1975.
[2] S. M. Ali and S. D. Silvey, “ A general class of coefficients of
divergence of one distribution from another,” J . Roy. Statist. Soc.,
for x E [ x , , ~ ] . (A.6) Ser. B, vol. 28, pp. 131-142, 1966.
[3] M. Ben Bassat, “f-entropies, probability of error, and feature
By combining (A.4), (AS), and (A.61, we finally obtain selection,” Inform. Contr., vol. 39, pp. 227-242, 1978.
[4] C. H. Chen, Statistical Pattern Recognition. Rochelle Park, NJ:
f(x) 0, for 0 Ix I 1, (‘4.7) Hayden Book Co., 1973, Ch. 4.
[SI -, “On information and distance measures, error bounds, and
from which inequality (A.1) follows immediately. feature selection,” Inform. Sci., vol. 10, pp. 159-173, 1976.
[h] C. K. Chow and C. N. Liu, “Approximating discrete probability
Theorem 9: Let q = ( q l , q 2 , . . . , q , l ) ,O I q , i l , 11iIt1, and distributions with dependence trees,” IEEE Trans. Inform. Theory,
Cy= I qJ= 1 . Then vol. IT-14, no. 3, pp. 462-467, May 1968.
1 [7] I. CsiszLr, “Information-type measures of difference of probability
-H(q) I
2 ’i’
d m
J - 1
. (A.8) distributions and indirect observations,” Studia Sci. Math. Hungar.,
vol. 2, pp. 299-318, 1967.
[8] R. G . Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
Proof: By the recursivity of the entropy function [1, p. 301, [9] S. Guiasu, Information Theory with Applications. New York: Mc-
we have Graw-Hill, 1977.
[lo] D. V. Gokhale and S. Kullback, Information in Contingency Tables.
H ( 4 1 = H ( 4 I 42 . ’ ’ > q,, - I + q J I )
3 ? New York: Marcel Dekker, 1978.
[ l l ] M. E. Hellman and J. Raviv, “Probability of error, equivocation,
and the Chernoff bound.” IEEE Trans. Inform. Theory, vol. IT-16,
no. 4, pp. 368-372, July 1970.
R. W. Johnson, “Axiomatic characterization of the directed diver-
gences and their linear combinations,” IEEE Trans. Inform. The-
ory. vol. IT-25, no. 6, pp. 709-716, Nov. 1979.
T. T. Kadota and L. A. Shepp, On the best finite set of linear
observables for discriminating two gaussians signals,” IEEE Truns.
Inform. Theory, vol. IT-13, no. 2, pp. 278-284, Apr. 1967.
T. Kailath, “The divergence and Bhattacharyya distance measures
in signal selection,” IEEE Transuctions Commun. Technol., vol.
COM-15, no. 1, pp. 52-60, Feb. 1967.
J. N. Kapur, “ A comparative assessment of various measures of
directed divergence,” Adimces Munag. Stud., vol. 3, no. 1, pp.
1-16, Jan. 1984.
D. Kazakos and T. Cotsidas, “A decision theory approach to the
approximation of discrete probability densities,” IEEE Trans. Pat-
tern Anal. Machine Intell., vol. PAMI-2, vol. 1, pp. 61-67, Jan. 1980.
S. Kullback, Information Theory and Stutistics. New York: Dover
Publications, 1968.
-, “ A lower bound for discrimination information in terms of
variation,” IEEE Trans. Inform. Theory. vol. IT-13, pp. 326-327,
Jan. 1967.
S. Kullback and R. A. Leibler, “On information and sufficiency,”
Ann. Math. Statist., vol. 22, pp. 79-86, 1951.
U. Kumar, V. Kumar. and J. N. Kapur, “Some normalized mea-
sures of directed divergence,” h z r . J . Gen. Syst., vol. 13, pp. 5-16,
1986.
J. Lin and S. K. M. Wong, “Approximation of discrete probability
distributions based on a new divergence measure,” Congressus
Numerantiitm, vol. 61, pp. 75-80, 1988.
H. Jeffreys, “An invariant form for the prior probability in estima-
tion problems,” Proc. Roy. Soc. Lon., Ser. A, vol. 186, 1946, pp.
453-461.
C. R. Rao and T. K. Nayak, “Cross entropy, dissimilarity measures,
and characterizations of quadratic entropy,” IEEE Trans. Inform.
Theory, vol. IT-31. no. 5. pp. 589-593. Sept. 1985.
IEEE TRANSACTIONS ON INFORMATION THEORY. VOL. 37. NO. 1, JANUARY IYYI 151
C. R . Rao, ”Diversity and dissimilarity coefficients: A unified Let P,,,, be the error probability of the maximum likelihood
approach,” Theoretical Pupirlution Bid., vol. 21, pp. 24-43, 1982. detector given that m = 0 is sent. There are several simple,
-, “Diversity: Its measurement, decomposition, apportionment generally applicable. upper bounds for this error probability.
and analysis.” Sankhya: Indian J . Statist., Ser. A, vol. 44. pt. 1. pp. Two of the most commonly used are the union hound
1-22. Feb. 1982.
G . T. Toussaint, ” O n some measures of information and their M-l
application t o pattern recognition,” in Proc. Coiif. Measirres of
information and Their Applicutiuns, Indian Inst. Technol., Bombay,
P,..II 5 c
/=I
!ad, / 2 u ) , (1.2)
Aug. 1974.
~, “Sharper lower bounds for discrimination information in where
terms of variation,” IEEE Trans. inform. Theory, vol. IT-21. no. 1,
pp. 99-100, Jan. 1975. Q( x ) = / : ( 2 2 ) I” exp ( - t 2 / 2 ) dt and d , = Ix, - xi,\;
1. Vajda, “Note on discrimination information and variation,” IEEE
Trans. Inform. Theory, vol. IT-16, pp. 771-773, Nov. 1970. and the minimum distance bound
-, “On the f-divergence and singularity o f probability
measures,” Periudicu Mathem. Hiingarrricu, vol. 2, pp. 223-234. 1972. P,.,,,<Pr{lnl2 d m i , , / 2 )= T ( N / 2 , d ; , , / 8 a 2 ) , (1.3)
__, Theory of Statistical Inference and Information. Dordrecht-
Boston: Kluwer, 1989. where d,,, = min, d , , and
J. W. Van Ness, “Dimensionality and the classification perfor-
mance with independent coordinates,” IEEE Trans. Syst. Man
Cybern., vol. SMC-7, pp. 560-564, July 1977.
A. K. C. Wong and M. You, “Entropy and distance of random
graphs with application to structural pattern recognition,” IEEE is a (normalized) incomplete gamma function. Unfortunately,
Trans. Pattern Anal. Machine Intell., vol. PAMI-7, no. 5. pp. (1.2) is somewhat loose for small values of d,,, / 2 u or large M ,
599-609. Sept. 1985. and (1.3) is loose for all but the smallest values of N .
In this note, we present a new upper bound to the probability
of error that is also straightforward to calculate and generally
applicable and improves upon the bounds in (1.2) and (1.3). In
On the Error Probability of Signals in Additive Section 11, we show
White Gaussian Noise M-l
Brian Hughes, Member, IEEE

P‘,.O< c
I = I
AN(d,/af,>d,/2a)? (1.5)
where a,, is the unique solution of

Absrracr -A new upper bound is presented to the probability of error
in detecting one of M equally probable signals in additive white Gauss-
ian noise. This bound is easy to calculate, can be applied to any signal
set, and is always better than the union and minimum distance bounds.
and where’
Index Terms -Error probability, Gaussian noise, signal detection,
performance bounds.
I. INTRODUCTION .r(( N - 1 ) / 2 , ( y - 1) + z / 2 ) dz , ( 1.7)

Consider the classical problem of detecting one of M signals ?(a, x ) = 1- T(a, x ) . (1.8)
in additive white Gaussian noise: An integer message m , equally
likely to be any element of (0; . ., M - I), is transmitted over an Moreover, for virtually all signal sets of practical interest, we
N-dimensional vector channel by sending a corresponding vector show that this bound reduces to
x g ; ~’ , x M - I E R N . When m = i is transmitted, the received
pc,,O Nmi,AN(Po,dmin/2a), (1.9)
vector is
where d,,, = min, + d , , Nml,,,
is the number of vectors for which
y =x, +n, (1.1) d , = d,,,, and where P,, satisfies
where the components of n are independent, identically dis-
NminAN(Po,o) = ’. (1 .IO)
tributed N ( 0 , v ’ ) random variables. An estimate of m is made In Section 111, we present series expansions for A , ( y , x ) and
at the receiver using the maximum likelihood rule. From ele- A,(y,O) that allow rapid calculation of (1.5) and (1.9). Finally,
mentary communication theory, the problem of detecting one of the bound is calculated for two examples in Section IV.
M equally likely waveforms in additive white Gaussian noise
with two-sided power spectral density u 7 can be expressed in OF THE BOUND
11. DERIVATION
this form.
For the channel ( 1 . 1 ) the maximum likelihood detector is the
Manuscript received November 20, 1989; revised August 6, 1990. This
minimum Euclidean distance rule:
work was supported by the National Science Foundation under Grant
No. NCR-8804257, and by the US Army Research Office under Grant
No. DAAL03-89-K-0130, This work was presented in part at the
with ties resolved arbitrarily. For convenience, we resolve ties in
Twenty-seiwth Annita1 Allerton Conference on Communication, Control,
and Compiiting. Monticello, IL, September 27-29, 1989. favor of the lurger index, and define decision regions D, =
T h e author is with the Department of Electrical and Computer (ylA(y)= i), 0 I i I M - 1.
Engineering, Johns Hopkins University, Baltimore, M D 21218.
IEEE Log Number 9040136. ‘For any real number x, x + = max(0, x)
O O I X - Y 4 4 X / Y I /OlOO-Ol5 1$01 .OO I99 1 IEEE

Divergence Measures Based On The Shannon Entropy: Member

Uploaded by

Copyright:

Available Formats

Divergence Measures Based On The Shannon Entropy: Member

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Divergence Measures Based On The Shannon Entropy: Member

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INFORMATION THEORY. VOL. 37, NO.

I , JANUARY 1991 145

0018-9448/9 I /0 100.0 145$0I .00 0 1991 IEEE

0.0 0.5 1.0

Fig. 1. Comparison of I , J , K , and L divergence measures.

The following relationship can also be established between I

Proof: Since p , ( x ) > 0 and p 2 ( x ) 20 for any x E X , by the

K ( p , , p , ) follows directly from its definition (3.1):

From the equality given in (3.8), we have

where H is the Shannon entropy function. Equation (3.14)

it follows that Let n-,,r2 2 0, x , + x , = 1, be the weights of the two proba-

which can be termed the Jensen-Shannon divergence. Since H

Theorem 4: The following upper bound,

0.0 0.5 1.0

Fig, 2. Shannon entropy and geometric mean.

where Proot By the definition of H ( C I X ) and the Cauchy in-

Also observing that < 4.

IEEE TRANSACTIONS ON INFORMATION T H E O R Y , VOL. 37, N O . I . JANUARY 1991 149

a decision problem with n

150 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37. NO. I . JANUARY l Y 9 l

Since f ( x ) is continuous in [ x l r x 2 ]and convex in (xl,x2), it

Brian Hughes, Member, IEEE

where a,, is the unique solution of

I. INTRODUCTION .r(( N - 1 ) / 2 , ( y - 1) + z / 2 ) dz , ( 1.7)

O O I X - Y 4 4 X / Y I /OlOO-Ol5 1$01 .OO I99 1 IEEE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.