Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
Processes
S. Tubaro
July 4, 2011
1
Contents
1 Energy/Power Spectral Density for Continuous Deterministic Signals 3
4 Random Processes 6
2
1 Energy/Power Spectral Density for Continuous De-
terministic Signals
Given a generic signal x(t) its normalized instantaneous power is defined as |x(t)|2 . If x(t)
is a voltage |x(t)|2 represents the instantaneous power dissipated applying x(t) to a resistor
of 1Ω. The normalized energy over the period T is defined (considering T centered around
∫ T /2 ∫ T /2
0) as −T /2 |x(t)|2 dt, while the normalized mean power is given by T1 −T /2 |x(t)|2 dt.
Considering T → ∞, for a signal with finite normalized energy, we have Ex(t) =
∫∞
|x(t)|2 dt. In this case the mean normalized power is, obviously, null.
−∞ ∫∞
A signal has finite mean normalized power if the following limit exists: limT →∞ T1 −∞ |x(t)|2 dt.
In this case the normalized energy does not exists (or in informal words is infinite).
In the following the terms ”normalized” and ”mean” with reference to energy and power
will be discharged for notation simplicity.
For a finite energy signal x(t) the following expression of its energy (Parseval theorem)
holds: ∫ ∞ ∫ ∞
Ex(t) = |x(t)| dt =
2
|X(f )|2 df
−∞ −∞
where X(f ) is the Fourier transform of x(t): X(f ) = F(x(t)). |X(f )|2 is called the Energy
Spectral Density of the signal x(t) and it describes how the energy of x(t) is distributed at
the various frequencies. Consider a linear system characterized by an impulse response h(t)
to which corresponds a transfer function H(f ) = F(h(t)) that is 0 every where except in a
very small frequency interval ∆f centered around f0 in which it is equal to 1 . The Fourier
transform of the system output will be Y (f ) = X(f )H(f ). For the Parseval theorem
∫∞ ∫∞ ∫ f +∆f ∆f →0
Ey(t) = −∞ |Y (f )|2 df = −∞ |X(f )|2 |H(f )|2 df = f00−∆f |X(f )|2 df = |X(f0 )|2 ∆f .
Therefore |X(f0 )|2 is the Energy Spectral Density of x(t) at f = f0 (the unique frequency
component that is not zeroed by the considered system).
We can write |X(f )|2 as X(f ) · X(f )∗ = F(x(t) ⋆ x∗ (−t) therefore the Energy Spec-
tral Density of the deterministic signal x(t) can be seen ∫ ∞ as the Fourier Transform of the
deterministic autocorrelation of the signal: Rx(t) (τ ) = −∞ x∗ (t)x(t + τ )dt.
Given a signal with finite power (characterized by ”infinite” energy) we can consider a
truncated version of it (between −T /2 e T /2): xT (t). This new signal has finite energy and
it is possible to evaluate its Energy Spectral Density: |XT (f )|2 where XT (f ) = F(xT (t)).
We can define the Power Spectral Density of xT (t) as
∫ 2
1 1 T /2 −j2πf t
SxT (t) ≡ |XT (f )| =
2
x(t)e dt
T T −T /2
Also the Power Spectral Density can be seen as the Fourier Transform of the au-
tocorrelation function of the considered deterministic signal, but it is necessary to de-
fine the autocorrelation in a slightly different way with respect to what defined before:
3
∫T
Rx(t) (τ ) = limT →∞ T1 −T x∗ (t)x(t + τ )dt. This new definition is in any case necessary
because the signal has now finite power and not finite energy.
∞
∑
x(t) = x(nT )ψ(t − nT )
n=−∞
The Fourier transform of ψ(t) is a box function in the frequency domain centered in the origin
with 1/T width and T height. Applying the Parseval theorem in its general formulation:
∫ ∞ ∫ ∞ ∫ ∞
x(t)y(t)dt = X(−f )Y (f )df = X(f )Y (−f )df
−∞ −∞ −∞
it is possible to write:
∫ ∞ ∫ 1
2T
ψ(t − iT )ψ(t − mT )dt = (T e−j2πf iT )∗ (T e−j2πf mT )df =
−∞ − 2T
1
∫ 1
2T
∫ 1
2T
T2 e−j2πf (m−i)T df = T2 cos(2πf (m − i)T )
− 2T
1
− 2T
1
4
The energy of x(t) is equal
∑∞to the energy of a step-wise signal where each step has value
x(nT ). In the case that T n=−∞ |x(nT )|2 assumes a finite value we consider this as the
energy of the discrete signal x(nT ).
When x(nT ) has no finite energy, we can try to evaluate its power. At first we evaluate
∫ t0 ∑
t0 /T
|x(t)|2 dt = T |x(nT )|2
−t0 n=−t0 /T
with t0 = KT where K is very big. This condition on K is necessary to ensure that the
area of the square values of the tails of the sinc functions
∫ t associated to each sample that fall
outside −t0 /2) and t0 /2 is negligible with respect to −t0 0 |x(t)|2 dt. Obviously it is also true
∫ t0 ∑
t0 /T
1 T
|x(t)|2 dt = |x(nT )|2
2t0 −t0 2t0
n=−t0 /T
The previous relation may be written also as |x(t)|2 = |x(nT )|2 that is valid also for t0 → ∞.
Therefore the power of x(t) is equal to the mean of the square values of the samples used
to generate it. The mean of the sample square values is defined also as the power of the
discrete signal x(nT ). Note that the power of a sample sequence is independent from to the
sampling rate. ∑∞
Considering a discrete deterministic signal x(nT ) with finite energy (T n=−∞ |x(nT )|2 ),
in coherence with what shown for continuous signal, we have an Energy Spectral Density
(ESD) that∑is |X(f )|2 where X(f ) is the Discrete Time Fourier Transform (DTFT) of x(nT ):
∞
X(f ) = T n=−∞ x(nT )e−j2πf nT . The ESD ∑can be seen as the DTFT of the autocorrela-
∞
tion function of the signal: Rx(nT ) (kT ) = T n=−∞ x∗ (nT )x((n + k)T ) = x(kT ) ⋆ x∗ (−kT ).
X(f ) is a periodic function, with period 1/T . The Inverse DTFT (IDTFT) is therefore de-
∫ 1
fined as: x(nT ) = −2T1 X(f )e−j2πf nT df . The energy of x(nT ) can be evaluated integrating
2T
∫ 1
the ESD in the same frequency interval: −2T1 |X(f )|2 df .
2T
If we work with normalized time/frequency axes (tn = t/T , fn = f T ) the scaling factor
T disappears in the DTFT definition and the ESD must be integrated between −1/2 and
1/2 to evaluate the total energy, but it is necessary to multiply the ESD by T to obtain the
correct result is the real sampling rate is T .
5
4 Random Processes
As known a real random variable x is a mapping between the sample space S and the real
line ℜ. That is, x : S → ℜ.
A real continuous random process (also known as stochastic process) is a mapping from
the sample space into an ensemble of time functions (known as sample functions). To every
ρ ∈ S there corresponds a function of time (a sample function) x(t; ρ). Often, from the
notation, we drop the ρ variable, and write just x(t). However, the sample space ρ variable
is always there, even if it is not shown explicitly.
For a random process x(t) the first-order probability distribution function is defined as
dF (a; t)
fx(t) (a; t) ≡
da
These definition can be generalized to n-th order case. In particular the n-th order
distribution function is:
∂Fx(t) (a1 , a2 , . . . an ; t1 , t2 , . . . , tn )
fx(t) (a1 , a2 , . . . , an ; t1 , t2 , . . . , tn ) =
∂a1 ∂a2 . . . ∂an
In general a complete statistical description of a random process requires knowledge of
all order distribution function.
A process is said to be stationary if its statistical properties do not change with time.
More precisely x(t) is stationary if:
6
• the autocorrelation function R(t, t + τ ) = E[x(t)x(t + τ )] depends only on the time
difference (τ ): R(t, t + τ ) = R(τ )
The stationarity of a process implies the wide-sense stationarity, however the converse is not
true.
A process is said to be ergodic if all orders os statistical and time averages are interchange-
able. For these processes the mean, autocorrelation and other statistics can be computed
by using any sample function of the process. That is, for the mean and autocorrelation:
∫ ∞ ∫ T
1
µx(t) = E[x(t)] = afx(t) (a; t)da = lim x(t; ρ)dt
−∞ T →∞ 2T −T
∫ ∞ ∫ ∞ ∫ T
1
R(τ ) = E[x(t)x(t+τ )] = a1 a2 fx(t) (a1 , a2 ; t, t+τ )da1 da2 = lim x(t; ρ)x(t+τ ; ρ)dt
−∞ −∞ T →∞ 2T −T
where x(t)∗ indicates the process that is the complex conjugate of x(t) (each sample function
of x(t)∗ is the complex conjugate of the correspondent sample function of x(t)).
And for a complex discrete process the autocorrelation function becomes:
For a complex process that is stationary al least in the wide-sense it is possible to define
the statistical power as:
The extension of the definition for a complex discrete wide sense stationary process is
straightforward.
The principal properties of the autocorrelation function can summarized as:
7
∗
Rx(n) (k) = Rx(n) (−k)
Due to the wide-sense stationarity of the considered processes the statistical power can
be seen also defined (in the continuous case) as:
∫ T
1
Px(t) (ρ) = lim x(t; ρ)∗ x(t; ρ)dt
T →∞ 2T −T
for each sample function of the process x(t; ρ) we can define the truncated Fourier transform:
∫ T /2
XT (f ) ≡ x(t)e−j2πf t dt
−T /2
8
The corresponding truncated power spectral density is T1 |XT (f )|2 . Since x(t) is a random
process, for each f , T1 |XT (f )|2 is a random variable. Let us denote its expectation by
1
Sx(t),T (f ) ≡ E[ |XT (f )|2 ]
T
and a natural definition of the power spectral density of the process is therefore
The Wiener-Khintchine asserts that the limit indicated in the previous formula exists
for all f and its value is
∫ ∞
Sx(t) (f ) = Rx(t) (τ )e−j2πf τ dτ
−∞
The unique required condition is that the Fourier Transform of the autocorrelation func-
tion exists.
Proof:
2
[ ] ∫ T /2
= E x(t)e−j2πf t dt
2
E |XT (f )|
−T /2
[∫ ∫ ]
T /2 T /2
= E x(t)x(τ )∗ e−j2π(t−τ ) dtdτ
−T /2 −T /2
∫ T /2 ∫ T /2
= E [x(t)x(τ )∗ ] e−j2π(t−τ ) dtdτ
−T /2 −T /2
∫ T /2 ∫ T /2
= Rx(t) (t − τ )e−j2π(t−τ ) dtdτ
−T /2 −T /2
9
T
d!
-T/2 T/2
!
d
-T
∫ T /2 ∫ T /2−τ
Figure 1: Integration domain of −T /2 −T /2−τ
f (α)dαdτ
possible to write:
∫ T /2 ∫ T /2−τ
f (α)dαdτ =
−T /2 −T /2−τ
∫ 0 ∫ T /2 ∫ T ∫ T /2−α
= f (α)dτ dα + f (α)dτ dα =
−T −T /2−α 0 −T /2
∫ 0 ∫ T /2 ∫ T ∫ T /2−α
= f (α) dτ dα + f (α) dτ dα =
−T −T /2−α 0 −T /2
∫ 0 ∫ T
= (T + α)f (α)dα + (T − α)f (α)dα =
−T 0
∫ T
= (T − |α|)f (α)dα
−T
This result is obvious looking at figure 1 and remembering that along horizontal strips
f (α) = cost.
Therefore
[ ] ∫ T /2 ∫ T /2
Rx(t) (t − τ )e−j2π(t−τ ) dtdτ
2
E |XT (f )| =
−T /2 −T /2
∫ T
= (T − |α|)Rx(t) (α)e−j2πf τ dα
−T
10
where we have defined
{
|τ |
(1 − T )Rx(t) |τ | < T
Rx(t),T (τ ) =
0 |τ | ≥ T
From an intuitive point of view considering the previous formula it is therefore easy to
accept that
1
Sx(t) (f ) = lim Sx(t),T (f ) = lim E[ |XT (f )|2 ] =
T →∞ T →∞ T
∫ T ∫ ∞
|τ |
lim (1 − )Rx(t) (τ )e−j2πf τ dτ = Rx(τ ) e−j2πf τ dτ
T →∞ −T T −∞
If we take the fn ’s as the complex-valued functions Rx(t),T (τ )e−j2πf τ , then the corresponding
limit is f = Rx(t) (τ )e−j2πf τ . Moreover, we can take the integrable function to be g =
|Rx(t) (τ )|. Then from the definition of Rx(t),T (τ ) it is straightforward to see that
for each sample function of the process x(n; ρ) we can define the truncated Fourier transform:
∑
N
XT (e j2πf
)≡ x(n)e−j2πf n
n=−N
11
The corresponding truncated power spectral density is 2N1+1 |XN (f )|2 . Since x(n) is a
random process, for each ej2πf , 2N1+1 |XT (ej2πf )|2 is a random variable. Let us denote its
expectation by [ ]
1
Sx(t),N (ej2πf
)≡E |XN (e |
j2πf 2
2N + 1
and also in this case a natural definition of the power spectral density of the process is
therefore
Sx(n) (ej2πf ) = lim Sx(n),N (ej2πf
N →∞
The Wiener-Khintchine asserts that the limit indicated in the previous formula exists
for all ej2πf and its value, for a discrete process, is
∞
∑
Sx(n) (ej2πf ) = Rx(n) (k)e−j2πf k
k=−∞
The unique required condition is that the Fourier Transform of the autocorrelation func-
tion exists.
12
Proof
[ ]
1
j2πf
Sx(n) (e ) = lim E |XN (e |
j2πf 2
N →∞ 2N + 1
[( N )( N )∗ ]
1 ∑ ∑
= lim E x(n)e−j2πn x(m)e−j2πm
N →∞ 2N + 1
n=−N m=−N
[ N ]
1 ∑ ∑N
= lim E x(n)x(m)∗ e−j2π(n−m)
N →∞ 2N + 1
n=−N m=−N
1 ∑
N ∑
N
= lim E[x(n)x(m)∗ ]e−j2π(n−m)
N →∞ 2N + 1
n=−N m=−N
1 ∑
N ∑
N
= lim Rx (n − m)e−j2π(n−m)
N →∞ 2N + 1
n=−N m=−N
1 ∑
N ∑
M
= lim lim Rx (n − m)e−j2π(n−m)
N →∞ M →∞ 2N + 1
n=−N m=−M
1 ∑
N ∑
M
Rx (n − m)e−j2π(n−m)
k=n−m
= lim lim →
N →∞ 2N + 1 M →∞
n=−N m=−M
( ∞ )
1 ∑
N ∑
= lim Rx (k)e−j2πk
N →∞ 2N + 1
n=−N k=−∞
( ∞ )
∑ 1 ∑
N
−j2πk
= Rx (k)e lim 1
N →∞ 2N + 1
k=−∞ n=−N
( ∞ )
∑
−j2πk
= Rx (k)e
k=−∞
it is ∫ 1/2
Px(n) = Rx(n) (0) = Sx(n) (ej2πf )df
−1/2
13
In all the previous discussions on the PSD associated to discrete processes it has been
assumed that the sampling rate of x(n) was unitary. In the case that the sampling time
is T (and not 1) the previous relations hold considering to work with normalized time and
frequency axes (tn = t/T , fn = f T ).
On the basis of what shown in the section 5 it is immediate to understand how is
modified the PSD of a WSS process (continuous or discrete) when it is applied as input to
a LTI system:
14