aasp15book
aasp15book
aasp15book
Adaptive and Array Signal Processing by Technischen Universität München is licensed under the
Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/.
Contact: Josef.A.Nossek@tum.de
Editor: Univ.-Prof. Dr. techn. Josef A. Nossek,
Institute for Circuit Theory and Signal Processing, Technische Universität München
Internal reference number: TUM-LNS-TR-15-05
Print: Fachschaft Elektrotechnik und Informationstechnik e.V., München
Contents
2. Mathematical Background 14
2.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Differentiation with respect to a Complex Vector . . . . . . . . . . . . . . . . . . 15
2.3 Quadratic Optimization with Linear Constraints . . . . . . . . . . . . . . . . . . . 17
2.4 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Characterization: Mean, Autocorrelation, Autocovariance, Variance . . . . 18
2.4.1.1 Time averages (averages along the process) . . . . . . . . . . . . 19
2.4.2 Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Eigenfilter (Generalized Matched Filter) . . . . . . . . . . . . . . . . . . . . . . . 32
3. Adaptive Filters 34
3.1 Linear Optimum Filtering (Wiener Filters) . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Minimum Variance Distortionless Response (MVDR) Beamforming . . . . 43
3.2.2 Generalized Sidelobe Canceller (GSC) . . . . . . . . . . . . . . . . . . . 44
3.3 Iterative Solution of the Normal Equation . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Least Mean Square Algorithm (LMS) . . . . . . . . . . . . . . . . . . . . . . . . 53
iii
iv Contents
5. Signal Reconstruction 71
5.1 LS Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 MVDR Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 MMSE Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 MF Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6. Downlink Beamforming 76
6.1 First Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Second Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Appendix 81
A1 Subspaces of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 83
1. Introduction and Motivation
A filter is an electronic circuit used to process signals, with the goal of removing undesired com-
ponents in the signal and/or enhancing desired components in the signal. Technically speaking,
filtering refers to the extraction of information about a desired quantity at time t by using data
measured up to and including time t [1]. A filter can be classified according to its characteristics
as either:
• passive or active,
• analog or digital,
• discrete-time (sampled) or continuous-time,
• linear or non-linear and
• infinite impulse response (IIR) or finite impulse response (FIR).
In this course we will focus on digital filters which are discrete-time by nature. A digital filter
works by performing discrete mathematical operations on a sampled version of the signal.
We can further classify a digital filter as either a fixed filter or an adaptive filter. As the name
implies, the parameters (coefficients) of a fixed filter cannot be adapted depending on the input
signal. This is a viable approach if the statistics of the signal to be processed are known a priori.
However, when this information is not known completely, an optimum fixed filter cannot be de-
signed beforehand. In this scenario of unknown statistics, an adaptive filter can be employed since
the filter adapts to the given environment, being able to filter a desired signal from the input signal
with unknown statistics. The use of an adaptive filter offers an attractive solution to the filtering
problem as it usually provides a significant improvement in performance over the use of a fixed
filter designed by conventional methods [1]. At each iteration, the parameters of the filter are up-
dated according to the filter parameters in the previous iteration, the input signal and some further
information depending on the specific filter. If the scenario is stationary, successive iterations of
such an algorithm converge to an optimum solution. In a non-stationary scenario, the algorithm is
to some extent able to the track the time variations of the statistics of the input signal, under the
assumption that the variations are sufficiently slow. As the name of this course implies, we will be
specifically dealing with adaptive digital filters.
In general, an adaptive filter can be represented as depicted in Fig. 1, where u[n], y[n], d[n]
and e[n] represent the discrete-time input signal, the discrete-time output signal, the desired signal
and the error, respectively. The error e[n] denotes the difference between the actual output and the
desired output. In addition, we have that w[n] is the weight vector which holds the set of the filter
parameters that can be adapted to drive the error to zero. Furthermore, we have that the current
state of the adaptive filter is denoted by x[n] and is stored in the memory of the filter. These signals
can be real or complex valued scalars or vectors. However, the vectors y[n], d[n] and e[n] must
1
2 1. Introduction and Motivation
have the same dimensions in order that the summation shown in Fig. 1 is consistent. The signals
can be summarized as follows:
• u[n] = input signal of the adaptive filter
• y[n] = output signal of the adaptive filter
• d[n] = desired or reference signal for the output
• e[n] = error between actual output and desired signal,
with w[n] as the set of filter coefficients at time instant n.
Notation: The notation that will be employed in this script will be the following. The vectors
will be represented as small boldface letters; meanwhile, the matrices will be given by capital
boldface letters. Since the signals are discrete-time we have that n is an integer.
w[n] e[n]
Adaptation
Algorithm
u Adaptive
Filter
y
e −
+ d
System Plant System
input output
y
e −
+ d
Delay
1.1.3 Prediction
Now we will take another focus on the application of adaptive filters. Let us assume that we are
interested in providing the best prediction of a given sample based on previous samples. If there
is correlation between the time samples one can use prediction. The predicted signal would be
the system output 2 and the prediction error would be the system output 1 shown in Fig. 1.4. An
example where prediction is done is encoding the speech in GSM.
System
output 2
+ d
Random u Adaptive y e System
Delay Filter output 1
signal −
lation of the interference is optimized in some sense. In Fig. 1.5, we depict a general example
for interference cancellation. The plant is now the path, through which the interference leaks into
the primary signal of interest. The adaptive filter, if appropriately adjusted, compensates for this
interference. The interference can also be an echo in a long distance telephone connection. Such
a situation is depicted in Fig. 1.6, the boxes marked N are balancing impedances. Fig. 1.7 gives
more details with an adaptive filter for echo compensation on the site of speaker A. Fig. 1.8 shows
another form of interference cancellation, where the interference may be acoustic noise, which
contaminates a voice signal picked up by a microphone (primary sensor). If we are able to pick up
the noise only with a reference sensor, we could cancel the noise component in the voice signal.
Primary
signal
Plant
+ d
Interference u Adaptive y e System
signal Filter − output 1
Hybrid N N Hybrid
A B
Speaker Speaker
A B
Line 2
Echo of A’s speech
Fig. 1.6. Long-distance telephone circuit
1.1 Application Areas 5
Adaptive Speaker
Filter Hybrid
B
Primary
sensor +
Signal
source Output
−
Estimate
of noise
Reference
Noise sensor Adaptive
source Filter
Input Output
binary binary
data Pulse Transmit Receive Adaptive Decision device
Medium
generator filter filter Equalization device
Noise
Plant, Channel
Transmitter Receiver
7
8 1. Introduction and Motivation
···
− y[n]
e[n]
d[n]
Fig. 1.10. Temporal Adaptive Equalization
First, we collect the signals present at the inputs of the various stages of the shift register into
an M -dimensional vector:
u[n]
u[n − 1]
∈ CM ,
u[n] = .. (1.1)
.
u[n − M + 1]
whereas the filter parameters are stacked in
w0∗
∗
w1∗
∈ CM .
w = .. (1.2)
.
∗
wM −1
where UH ∈ C(N +1)×M with U = [u[n], u[n + 1], . . . , u[n + N ]]. In addition, we have that
e[n] = d[n] − y[n] and that e∗ [n] = d∗ [n] − y ∗ [n] and let us denote e[n] as a collection of N + 1
error samples
e[n]
e[n + 1]
∈ CN +1 .
e[n] = .. (1.6)
.
e[n + N ]
Furthermore, let us collect N + 1 samples of the desired signal
d[n]
d[n + 1]
∈ CN +1 .
d[n] = .. (1.7)
.
d[n + N ]
The problem to be solve can be posed in the following way:
Depending on the value of N +1 (the number of equations, the number of input vectors) and M
(the number of degrees of freedom in our weight vector), we have the following three possibilities:
• N + 1 = M : If N + 1 is equal to M , we can make the error zero, i.e. ke[n]k22 = 0 and
d[n] = y[n], and hence, d∗ [n] = UH · w.
• N + 1 < M : If N + 1 is less than M , we can still make the error zero like the previous case,
i.e. ke[n]k22 = 0. However, if (N + 1) < M , the solution w is not uniquely determined since
we have fewer equations than unknowns and we can impose additional restrictions on w (e.g.
minimum norm kwk) to arrive at a specific solution. For M = N + 1 and rank{UH } = M we
have an unique solution already.
• N + 1 > M : For the usual case of N + 1 > M , we have an overdetermined system of linear
equations and in general, we will not have zero error! Nevertheless, we aim at minimizing
ke[n]k22 !
10 1. Introduction and Motivation
This amounts to compute the derivative of J(w) given in (1.9) and set the derivative to zero to find
the wopt which minimizes J(w).
u0 [n]
w0∗
Impinging
wavefront θ ∆
u1 [n]
τ w1∗
∆
u2 [n] P Output
2τ w2∗
y[n]
u3 [n]
3τ w3∗
u4 [n]
4τ w4∗
Adaptive
Control
Algorithm
where in the last equality we have assumed that distance between adjacent antennas is half a
wavelength of the impinging wavefront, i.e. ∆ = λ2 1 . Using (1.13), the electric phase angle is
φ = ωc τ = π sin θ. (1.14)
1
In this course, unless otherwise stated we will usually make this assumption when we have a ULA.
12 1. Introduction and Motivation
with
uH [n]
uH [n + 1]
∈ C(N +1)×M
H
U = .. (1.20)
.
uH [n + N ]
At the moment we are lacking a desired signal d[n], which we can use to drive the weight vector
to an optimal solution wopt !
Assume, we have several planar wavefronts (far-field approximation) incident on the sensor
array arriving at angles θ1 , θ2 , . . . , θd . Let the wavefront with angle of arrival (AoA) θd be the
desired one and consider all the other wavefronts as interferers which should be suppressed. θd
is the only AoA, which we know a priori. Let us first calculate the contribution of this desired
wavefront to the beamformer output. To this end, we assume that the received signal impinging on
the first antenna of the array is d[n]. Therefore, we have that the desired signal impinging on the
k-th antenna of the ULA at time n is denoted by dk [n] and for k = 0, . . . , M − 1, given by
d0 [n] = d[n]
d1 [n] = d[n − τ ] ≈ d[n]e−jφd
d2 [n] = d[n − 2τ ] ≈ d[n]e−j2φd
.. ..
. .
dM −1 [n] = d[n − (M − 1)τ ] ≈ d[n]e−j(M −1)φd
where the approximation ≈ comes from the narrowband assumption, i.e the envelope of the signal
of our wavefront is approximately constant for several multiples of τ .
We put this array measurement of our desired signal into vector form
1
d0 [n]
d1 [n]
e−jφd
e−j2φd
d[n] = .. ≈ · d[n] = a(θd ) · d[n], (1.21)
. ..
.
dM −1 [n] −j(M −1)φd
e
1.4 Multichannel Adaptive Beamforming 13
i.e. we require the beamformer output due to the desired wavefront to be d[n]. This means that
wH · a(θd ) = 1. Besides this constraint we would like to minimize the output power of the
beamformer, since this will minimize the interference power at the output.
The total output signal y[n] = wH [n] · u[n] and assuming D impinging wavefronts on the array,
PD
we have that u[n] = ui [n] as the sum of the D impinging wavefronts where our desired signal
i=1
is ud [n] = d[n]. For each impinging wavefront i = 1, . . . , D the received signal of the ULA is
1
e−jφi
e−j2φi
ui [n] = · ui [n] = a(θi ) · ui [n], (1.23)
..
.
−j(M −1)φi
e
The problem at hand is a quadratic minimization subject to a linear constraint, i.e. a linearly
constrained least squares (LCLS) problem! In next chapter, we will see how we can solve such a
problem.
2. Mathematical Background
As we have seen in Section 1.3 and 1.4, we need to find extreme points of a real valued scalar
cost function, which in general is a function of a complex vector. This optimization can be an
unconstrained or a constrained one. In order to find extreme points, we have to compute derivatives
with respect to complex vectors.
2.1 Gradients
Let us start with computing derivatives with respect to vectors in the field of reals
x ∈ Rn
f (x) ∈ R
f : Rn → R
∂f
∂x1
∂f
df (x)
∂x2
= .. = gradf (x) = ∇x f (x)
dx .
∂f
∂xn
T
df (x)
df = · dx.
dx
If f (x) is linear
f (x) = aT x = xT a,
where a ∈ Rn , then we have
df
= a,
dx
since
n
X
f (x) = ak · x k ,
k=1
∂f (x)
= ak .
∂xk
14
2.2 Differentiation with respect to a Complex Vector 15
If f (x) is quadratic
f (x) = xT Ax,
where A ∈ Rn×n , we have
df d d
= xT v | + uT x |
dx dx v = Ax = const dx uT = xT A = const
d d
= xT v + xT u
dx dx
= v+u
= A + AT x
= | 2Ax.
A=AT
z = x + jy
z ∈ Cn
x, y ∈ Rn
z∗ = x − jy
h(z) : Cn → C
f (x, y) : Rn × Rn → C
g(z, z∗ ) : Cn × Cn → C.
h(z) = kzk22
f (x, y) = xT x + y T y
g(z, z∗ ) = (z∗ )T z
h(z) = f (x, y) = g(z, z∗ ).
The definition of a gradient enables us to compute the increment of the function due to an
increment of the vector argument:
T
dh
dh = · dz
dz
T T
∂f ∂f
df = · dx + · dy
∂x ∂y
T T
∂g ∂g
dg = · dz + · dz∗ .
∂z ∂z∗
16 2. Mathematical Background
∂f ∂f ∂g ∂g
Let us assume that f, g ∈ C are differentiable, therefore, ∂x , ∂y and ∂z , ∂z∗ exist and thus, df
and dg can be computed and df = dg!
dh
dz
exists only, if h(z) is analytic. That means that assuming h = hR + jhI (Re{h} = hR and
Im{h} = hI ), then the Cauchy-Riemann equations must hold:
∂hR ∂hI
= and
∂x ∂y
∂hR ∂hI
= −
∂y ∂x
Many functions encountered in signal processing are not analytic, e.g. wH Rw. In fact, cost
functions are real valued and, thus, not analytic. Therefore we have to drop the possibility to work
with h. It is easy to show, that
∂g 1 ∂f ∂f
∂f ∂g ∂g
= −j
= + ∗
∂z
2 ∂x ∂y
∂x ∂z ∂z ⇔
∂f ∂g ∂g .
= j − ∗
∂g 1 ∂f ∂f
∂y ∂z ∂z
∗
= +j
∂z 2 ∂x ∂y
∂f ∂f
Since f is real valued, both ∂x
and ∂y
are also real valued and, therefore,
∗
∂g ∂g
=
∂z ∂z∗
T T
∂g ∂g
dg = dz + dz∗
∂z ∂z∗
T H
∂g ∂g
= dz + dz∗
∂z ∂z
T T !∗
∂g ∂g
= dz + dz
∂z ∂z
( )
T
∂g
= 2Re dz
∂z
( )
H
∂g
= 2Re dz∗ .
∂z
Additionally, note that
!
dg = df,
since
T T
∂g ∂g
dg = dz + dz∗
∂z ∂z∗
T T ! T T !
∂g ∂g ∂g ∂g
= + dx + j − dy
∂z ∂z∗ ∂z ∂z∗
T T
∂f ∂f
= dx + dy.
∂x ∂y
2.3 Quadratic Optimization with Linear Constraints 17
∂g ∂g
Therefore, only one derivative, either ∂z or ∂z ∗ must be computed to have the full gradient
∂g
information! To compute stationary points of f or g, it is sufficient to set ∂z ∗ = 0. The direction
of the steepest descent (gradient descent) is given by −dg. To this end, we need to find for which
direction dz, with a given length kdzk, we will obtain a maximal dg.
( )
H
∂g !
dg = 2Re dz∗ = max .
∂z
From this and based on the Cauchy-Schwarz inequality, we conclude that the steepest descent will
∂g ∗ ∂g
be achieved if ∆z = −µ ∂z ∗ or ∆z = −µ ∂z , with µ ∈ R+ .
respectively.
Choosing as a simple example cost function
f (w) = wH w.
Then
∂f
=w
∂w∗
and
∂L
= 0,
∂w∗
from where we have
w = −Sλ.
and
−SH Sλ − g = 0.
Computing λ
λ = −(SH S)−1 g,
and plugging this in the previous equation we then have
where (SH )+ is the so-called pseudo-inverse of SH , which will be discussed later on.
µ[n] = µ
r[n, n − k] = r[k]
c[n, n − k] = c[k],
2.4 Stochastic Processes 19
respectively. The autocovariance function can be expressed by the autocorrelation function and the
mean value function:
where r[0] and c[0] are the mean square value and the variance, respectively. A summary of these
general definitions is shown in Table 2.1.
E[µ̂[n]] = µ,
E[|µ − µ̂[n]|2 ] = 0
E[|r[k] − r̂(k, N )|2 ] = 0,
mean ergodic and correlation ergodic, in the mean square error sense, respectively. A summary
with the definitions for the time averages is shown in Table 2.2.
The stochastic processes which we will work with are stationary, ergodic and zero-mean.
u[n]
H
u[n − 1]
∗ ∗ ∗
R = E u[n] · u [n] = E .. · [u [n], u [n − 1], · · · , u [n − M + 1]]
.
u[n − M + 1]
r[0] r[1] ··· r[M − 1]
r[−1] r[0] ··· r[M − 2]
∈ CM ×M ,
R= ...
... ... ...
r[−M + 1] r[−M + 2] · · · r[0]
which is a matrix that is both Toeplitz and Hermitian. A matrix is Toeplitz
if the entries along
H T ∗
the diagonals are the same. A matrix is Hermitian if R = R = R , which leads to having
r[−k] = r∗ [k]. In addition, R is nonnegative definite, which means that xH Rx ≥ 0 ∀x. This is
shown in the following. Let us denote
y = xH · u[n]
y ∗ = uH [n] · x,
resulting in
|α|2 + σν2 k = 0
r[k] = .
|α|2 ejωkT k =
6 0
Thus, the correlation matrix is
1
1 + SNR ejωT ··· ejωT (M −1)
−jωT 1
2
e 1 + SNR ··· ejωT (M −2)
R = |α| · .. .. ... ,
. . ...
1
e−jωT (M −1) e−jωT (M −2) · · · 1 + SNR
|α|2
where SNR = σν2
.
22 2. Mathematical Background
The columnspace or image or span(A) is the vector space spanned by the column vectors of A
Now we will consider three different cases concerning the size of A. Let us start with the most
familiar, but not necessarily most important case, i.e. with a square full rank matrix : r = m = n,
i.e. the number of equations is equal to the number of unknowns and all equations are linearly
independent.
n x b
m A • =
x = A−1 b.
This does not mean, that we should compute the solution by calculating the inverse A−1 . There are
many more algorithms available to solve such a set of linear equations like Gaussian Elimination,
LU-, LR-, QR-, QL-decomposition or conjugate gradient (CG) descent, which depending on the
specific setting should be employed.
2.5 Linear Equations 23
Next let us assume that m > n, i.e. more equations than unknowns (overdetermined system of
equations), but still r = n, i.e. a full rank tall matrix:
n
In this case we can
premultiply from the
n m
left with AH getting
=
AH Ax = AH b,
m A •
x b
where AH A ∈ Cn×n now is a full rank square matrix, which has a unique standard inverse
leading to the following solution
−1
x = AH A AH b = A+ b = xLS ,
which is the so-called least squares solution, as we shall see later. This solution is unique and can
be computed with QR-, QL-decompositions or CG-descent. Anyway, we have to be aware that
AxLS 6= b.
Looking finally on the case m < n (underdetermined system of equations) and still A being a
full rank flat matrix m = r:
n
dim(spanA) = m
if b ∈ spanA ⇒
n m
m A • = ∃x : Ax = b
which is not unique!
x b
Since there is no unique solution but a manifold of solutions, we can choose one of these from
the (n − r)-dimensional solution space, e.g. by minimizing the norm of the solution vector:
Before we proceed let us look at a specific example discussing the three aforementioned dif-
ferent cases. Assume noisefree measurements taken from a ULA with m sensors exposed to n
24 2. Mathematical Background
impinging planar wavefronts. If the signal from the i-th impinging wavefront is given by xi then
the measurement is
n
X
b = xi · a i
i=1
b = Ax,
T
where x = x1 x2 · · · xn where A is the array steering matrix
b = Ax + n,
where b in general in none of these cases is in the column space of A, and therefore b ≈ Ax!
Let us first have a closer look at m > n = r, i.e. all wavefronts have distinct DoA’s and we
have more antenna elements than wavefronts. Therefore we have
min(m,n)
Σ = diag{σi }i=1 ∈ (R+ ∪ {0})m×n
with
σ1 ≥ σ2 ≥ . . . ≥ σr > σr+1 = . . . = σmin(m,n) = 0.
The following two pictures try to visualize the situation.
n r m−r r m−rn−m n
Σs VsH r
m A =m Us Uo • m •
Σo n
VoH n−r
n r m−r r n−r
Σs VsH r
• n
m A =m Us Uo • m
VoH n−r
Σo
m−r
n m−n
n
A+ = VΣ+ UH with Σ+ = Σ−1
s ·
For r < n the situation is a little bit more involved. For m > n > r, Σ reads as follows
2.5 Linear Equations 27
r n−r
Σs
m
Σ= ·
0
Σ−1
s
A+ = VΣ+ UH with Σ+ = ·
0
The obtained x with x = A+ b is the solution vector with the smallest norm. We will show this
by asking the question how to choose Σ+ in order to minimize knk22 , n = Ax − b?
∂knk22
∗
= VΣT ΣVH x − VΣT UH b = 0.
∂x
+ + H
For x = A b = VΣ U b the above derivative should be zero. Let us first look at ΣT Σ:
Σs Σs Σ2s
• = = Σ2S0 ·
0 0 0
28 2. Mathematical Background
Therefore we have
∂knk22
= | VΣ2S0 |V{z
H
V}Σ+ UH b − VΣT UH b
x∗ x=VΣ+ UH b 1n
= V Σ2S0 Σ+ −Σ T
UH b = 0,
Thus
∂knk22
= | 0
x∗ x=VΣ+ UH b
implies that
Σ2S0 · Σ+ = ΣT
Σ2s Σ−1
s Σs
• = ·
0 ∗ 0
For the don’t care entry ”*” we can choose whatever we want, the derivative will be zero.
Therefore, let us use this degree of freedom to minimize the norm of the solution x!
Σ−1
s
Σ+ =
·
0
Now we have a solution for computing generalized inverses, no matter how large m, n and r
are. We simply invert only the nonzero singular values to obtain Σ+ and leave the zeroes as they
are.
Given A and its SVD we now discuss the four fundamental subspaces of a matrix. We already
had the column space of A:
Us , the first r columns of U are a unitary basis of S, which is also called signal subspace. Next
we have the nullspace or kernel of A:
Vs , the first r columns of V are a unitary basis of Sl . Finally, we have the nullspace of AH (left
nullspace of A):
Nl = ker(AH ) = {x|AH x = 0} dim(Nl ) = m − r.
Uo , the last m − r columns of U are a unitary basis of Nl , which is also called the noise subspace!
The four fundamental subspace are summarized in Table 2.3.
In signal processing we will mainly deal with the signal and the noise subspace. It is interesting
to note, that the SVD is not unique, but the four fundamental subspaces, for which the SVD gives
unitary basis vectors, are unique. This will be shown in the following calculation:
1r UHs
= Us Uo · · = Us UHs .
0 UHo
Projectors are:
• Hermitian matrices: P = PH
30 2. Mathematical Background
• Idempotent matrices: P = P2
• Rank deficient matrices: rank(P) < m.
′
A change of the basis vectors for a given subspace Us = Us Q, Q unitary, does not change the
projector
′ ′ ′
PS = Us Us,H = Us QQH UHs = Us UHs = PS !
| {z }
1r
For tracking subspaces it is advantageous to work with projectors and not with basis vectors of
the (slowly) changing subspace.
Now we can look at the connection between SVD and the well known eigenvalue decomposi-
tion (EVD), which exists only for square matrices
(Attention: not every square matrix is diagonalizable. This leads to the so called Jordan forms,
which we will not discuss here.)
The above transform from A to Λ (and vice versa) is called similarity transform. Similarity
transforms leave the eigenvalues and the trace of a matrix invariant. If A is a normal matrix, then
Q is a unitary matrix.
A is a normal matrix ⇔ AH A = AAH .
It is easy to see that hermitian matrices are normal. If A is hermitian and positive semidefinite,
then we have
A = QΛQH , Λ = diag{λi }m i=1 , λi ∈ R+ ∪ {0}.
Additionally, if we arrange the eigenvalues such that λ1 ≥ λ2 ≥ . . . λm , then the EVD and SVD
of A are identical. Covariance matrices are always hermitian and positive semidefinite.
The column vectors q of Q are the so called eigenvectors of A:
Aq = λq.
This equation tells us that there are vectors q, when multiplied by A do not change directions
in space but will only be scaled by λ. To determine those vectors q, we have to solve
(A − λ1)q = 0,
which has nontrivial solutions only, if det(A − λ1) = 0, which determine the eigenvalues λ.
Assume that A is an estimated correlation matrix R = UUH , which is positive semidefinite
and hermitian. We will now show, that the eigenvalues of such a matrix are nonnegative.
Rqi = λi q i
H
qi Rqi = λi qHi qi
= λi kqi k22 ,
If all eigenvalues of such a matrix are distinct, then all eigenvectors will be perpendicular to
each other, i.e qHi qj = 0 ∀i 6= j.
Rqi = λi q i
H
qj Rqi = λi qHj qi .
Similarly we have
Rqj = λj qj
qHj R = λj qHj
qHj Rqi = λj qHj qi .
0 = qHj qi ,
from where we have that qHj qi = 0, which means that qj and qi are perpendicular!
By simply scaling all eigenvectors to unit norm, the matrix Q = [q1 , q2 , . . . , qm ] is unitary!
Moreover, we have
tr(R) = tr(QΛQH )
= tr(QH QΛ)
= tr(Λ)
Xm
= λi ,
i=1
where tr(R) is the trace of R, i.e. the sum of the diagonal elements of R.
H
The Rayleigh quotient of R is xxHRx x
.
xH Rx
λmax = max
x ∈ Cm xH x
x 6= 0
and
xH Rx
λmin = min .
x ∈ Cm xH x
x 6= 0
32 2. Mathematical Background
Noise
ν[n]
Fig. 2.1. Linear Filtering
where we have that the signal u[n] is uncorrelated with the noise ν[n]:
E [u[n]ν ∗ [m]] = 0 ∀ n, m.
From Figure 2.1, we have that the output y[n] can be expressed as
P = E [y[n]y ∗ [n]]
= E wH (u[n] + ν[n]) (uH [n] + ν H [n])w
= E wH (u[n]uH [n] + u[n]ν H [n] + ν[n]uH [n] + ν[n]ν H [n] w
= wH E u[n]uH [n] + E ν[n]ν H [n] w
= wH Rw + wH σ 2 w
= PS + PN ,
where PS and PN are the power of the signal and the power of the noise, respectively. Thus, we
can express the signal to noise ratio (SNR) at the output as
PS wH Rw
SNR = = 2 .
PN σ kwk2
Let us now find the wopt that maximizes the SNR. To this end, let us differentiate the SNR
expression with respect to w∗ and set it to zero
H
∂ w Rw Rw · wH w − wH Rww
= = 0.
∂w∗ wH w (wH w)2
2.6 Eigenfilter (Generalized Matched Filter) 33
wH Rw
Rw = ·w
wH w
= λ · q,
wH Rw
where w (= q) is an eigenvector and wH w
(= λ) is the corresponding eigenvalue of the matrix
R. Thus, the SNR is maximized for
wopt = qmax
where qmax is the eigenvector corresponding to the largest eigenvalue λmax of R, i.e.
λmax (R)
SNRmax = .
σ2
3. Adaptive Filters
Noise
Channel ν[n] ν[n]
ˆ
y[n] = d[n]
s[n] h[n], h x[n] u[n] w∗ [n]
s[n] H w∗
x[n] u[n]
34
3.1 Linear Optimum Filtering (Wiener Filters) 35
where
1 n=0
δ[n] = .
0 else
In addition we have
h0 s[n]
h1 s[n − 1]
∈ CK+1 , ∈ CN ,
h= .. s[n] = ..
. .
hK s[n − N + 1]
and
K
X
x[n] = s[n] ⋆ h[n] = hk · s[n − k],
k=0
and is called convolutional matrix, which has Toeplitz structure. With H we can write
x[n] = H · s[n],
and
u[n] = x[n] + ν[n] = Hs[n] + ν[n],
with u[n], ν[n] ∈ CM . The output y[n] is obtained by convolving u[n] with w∗ [n]
and arrive at
∂J
= −p + Rw = 0 → wopt = R−1 p
∂w∗
The minimum error computes to
where it has been assumed that R is positive definitive, which means that R−1 exists. It is Hermi-
tian anyway. The canonical form of the error surface J(w) can be described as
where λk > 0.
Now assume h[n] and therefore h and H are known to the receiver (e.g. through the process of
channel estimation). Let us rewrite
E u[n]uH [n] = Ruu ∈ CM ×M
E s[n]sH [n] = Rss ∈ CN ×N
E ν[n]ν H [n] = Rνν ∈ CM ×M
All three covariance matrices are Hermitian and positive definite. Therefore, all three have a
standard inverse.
Since u[n] = Hs[n] + ν[n], then
Ruu = E u[n]uH [n] = E (Hs[n] + ν[n])(Hs[n] + ν[n])H
= HRss HH + H · E s[n]ν H [n] + E ν[n]sH [n] · HH + Rνν
| {z } | {z }
0 0
H
= HRss H + Rνν ,
and
pud = E [u[n]d∗ [n]]
= E [(Hs[n] + ν[n])d∗ [n]]
= HE [s[n]s∗ [n − l]] + E [ν[n]s∗ [n − l]]
| {z }
0
= HE s[n]sH [n] · el+1
= HRss el+1 ,
where el+1 is a vector with all entries equal to zero except the (l + 1)th entry
1 0
2 0
.. ..
. .
el+1 = .
l+1 1
.. .
. ..
N 0
38 3. Adaptive Filters
Therefore,
wopt = R−1 p
= R−1
uu pud
−1
= HRss HH + Rνν HRss el+1 .
where A and C must be square and full rank, and B and D must be of appropriate sizes, we can
rewrite the optimum filter vector:
H
−1
wopt = eTl+1 Rss HH HRss HH + Rνν
−1 H −1
T H −1 −1 −1 H −1
= el+1 Rss H Rνν − Rνν H Rss + H Rνν H H Rνν
−1
= eTl+1 Rss HH R−1 νν − H H −1
R νν H R −1
ss + H H −1
R νν H H H −1
R νν
−1
= eTl+1 Rss 1N − HH R−1 −1
νν H Rss + H Rνν H
H −1
HH R−1νν
−1 H −1
= eTl+1 Rss R−1 H −1
ss + H Rνν H − H Rνν H
H −1
R−1 H −1
ss + H Rνν H H Rνν
−1
= eTl+1 R−1 H −1
ss + H Rνν H HH R−1νν ,
Rss = σd2 1N
Rνν = σν2 1M
and from using wopt = R−1 p we had also that the optimum filter vector can be expressed as
2 −1
H T H σν H
wopt = el+1 H 1M + HH .
σd2
2
For very large SNR ( σσν2 → 0) the parenthesized matrices in the last two expressions will
d
converge to either (HH H)−1 or (HHH )−1 , respectively. Since H is a flat matrix, HH H is not full
rank and therefore does not have a standard inverse. But HHH is full rank and has a standard
inverse and we get
H
H T H H −1 T H +
lim wopt = el+1 H HH = el+1 H .
SNR→∞
3.1 Linear Optimum Filtering (Wiener Filters) 39
We could equally well compute a generalized inverse of HH H making use of the SVD of
H = UΣVH :
+ +
HH H HH = VΣT UH UΣVH VΣT UH
+
= VΣT ΣVH VΣT UH
+
= V Σ T Σ Σ T UH
= VΣ+ UH
= H+ .
u[n] = Hs[n]
u[n + 1] = Hs[n + 1]
···
u[n + N − 1] = Hs[n + N − 1].
Using the specific transmit vectors above, we get the following receive matrix:
1 h0 1 h1 1 0
2 0
2 h0
2 0
u[n] = .. .. , u[n + 1] = .. .. , · · · , u[n + N − 1] = .. .. .
. . . . . .
M 0 M 0 M hK
h0 0 0 ··· 0
h1 h0 0 ··· 0
T
u [n]
h2 h1 h0 ··· 0
uT [n + 1] .. ... ... ... ..
. .
T
U= u [n + 2] =
hK hK−1 hK−2 · · · h0 ∈ CN ×M
..
0 hK hK−1 · · · h1
.
T
u [n + N − 1]
.. ... ... ... ..
. .
0 0 0 ··· hK−1
0 0 0 ··· hK
40 3. Adaptive Filters
That is, we try to enforce zero intersymbol interference (ISI), but have not enough degrees of
freedom to do so. Therefore we refrain to a least squares solution.
We could also drop some of the equations enforcing the (N −1) zeros to have only M equations
enforcing (M − 1) zeros and a one at delay l. This is equivalent to neglecting the ISI contribution
from (N − M ) leading and/or lagging transmit symbols and can be expressed by a reduced receive
matrix
UTr = Hr ∈ CM ×M
by dropping the Npost first and Npre last rows, such that
Nr = N − (Npre + Npost ) = M
holds. Then Hr is a square full rank matrix and
wH Hr = eTl+1
can be exactly fulfilled with the unique solution
H
wZF = eTl+1 H−1
r ,
which zeroes out the ISI from adjacent symbols within the Nr -window and, therefore, is called
zero forcing solution.
2
On the other hand, if σσν2 ≫ max {λi }N
i=1 (low SNR region) is fulfilled, we get
d
H σd2
wopt = eTl+1 · HH · .
σν2
σd2
We will show in the following, that this wopt is the so called matched filter. Replace first HH · σν2
with G with
g1H
gH
2
G = .. ∈ CN ×M
.
H
gN
eTl+1 G = gl+1
H
3.1 Linear Optimum Filtering (Wiener Filters) 41
and
H H
y[n] = (gl+1 Hs[n] + gl+1 ν[n]).
The last term is the noise contribution with a variance
H
E |gl+1 ν[n]|2 = gl+1 H
Rνν gl+1
= σν kgl+1 k2 .
2
The first term contains the desired signal contribution, stemming from s[n − l] = d[n] and in-
tersymbol interference from previous and subsequent symbols. Let us focus on the desired part,
which is
H H
gl+1 Hel+1 s[n − l] = gl+1 Hel+1 d[n],
the variance of which is
H
E gl+1 Hel+1 d[n]d∗ [n]eTl+1 HH gl+1 = gl+1
H
Hel+1 eTl+1 HH gl+1 · E [d[n]d∗ [n]].
| {z }
σd2
where
0 ··· 0 ··· 1
.
0 ··· 0 0
..
Π=
0 ··· 1 ··· 0
.
0 0 ··· 0
..
1 ··· 0 ··· 0
and Π · Π = 1. With the variance of the desired signal becomes
H
(gl+1 Πh)(hH Πgl+1 )σd2 = |gl+1
H
Πh|2 σd2 .
The ratio of the desired signal variance and the noise variance at the filter output is
H
|gl+1 Πh|2 σd2
SNR = · ,
kgl+1 k22 σν2
Thus, we see that the MMSE solution converges in the high SNR regime to the ZF solution and
in the low SNR regime to the MF solution.
3.2 Spatial Filtering 43
y[n] = wH · u[n]
u[n]
u[n − 1]
= wH · u[n − 2]
..
.
u[n − M + 1]
1
−jω0 T
e
−jω 2T
· e 0
jω0 T n
H
= w ·e
..
.
e−jω0 (M −1)T
= ejω0 T n · wH · a(φ0 ),
with φ0 = ω0 T . Note that a(φ0 ) is a Vandermonde vector which is identical to the array steer-
ing vector which has been introduced in Section 1.4, when dealing with spatial filtering, i.e with
beamforming. Our aim is now to design the filter weight vector w such that this complex har-
monic with frequency ω0 should pass unattenuated, while any other frequency component of the
input signal should be attenuated as much as possible. Assume now that u[n] contains beside our
desired component at ω0 many other spectral components. We therefore minimize the variance of
the filter output but make sure, that the desired component is not suppressed
wopt = argmin E wH u[n]uH [n]w = argmin wH Rw
w w
g ∗ being the complex gain of our filter at ω0 . The problem, therefore, belongs to the class we have
dealt with in Section 2.3 (Quadratic Optimization with Linear Constraints).
44 3. Adaptive Filters
Since we have only one constraint, the corresponding Lagrangian function reads
Finally we have
gR−1 a(φ0 )
w= .
aH (φ0 )R−1 a(φ0 )
If we interpret φ0 = 2π ∆
λ
sin θ0 , then this is the so called linearly constrained minimum variance
(LCMV) beamformer. If we set g = 1, then we have the minimum variance distortionless response
(MVDR) beamformer.
The minimum variance, which is attained by the MVDR beamformer is
H 1
Jmin = wopt Rwopt = −1
aH (φ 0 )R a(φ0 )
1
SMVDR (φ) = , φ ∈ [−π, π].
aH (φ)R−1 a(φ)
and g is the vector of antenna gain in the directions of arrival (or departure) θk for k = 0, . . . , K −1.
Using zeros and ones as entries of this vector means that signals impinging from the correspond-
ing directions are either suppressed or preserved. Let us augment the columns of C with some
orthogonal columns to obtain a square matrix
K M −K
U= z}|{ z}|{ ∈ CM ×M , CH Ca = 0.
C Ca
3.2 Spatial Filtering 45
These additional column vectors, which are collected in Ca are a basis for the orthogonal comple-
ment of the space spanned by the columns of C, i.e. image(C) and dim (image(C)) = K. Since
U is full rank, the column vectors are a basis for the M -dimensional vector space and we can write
CH · w = CH Cv − CH Ca wa = CH Cv = g.
| {z }
0
Therefore, we have
v = (CH C)−1 g
and
wq = Cv = C(CH C)−1 g
is the so called quiescent weight vector and finally we have
w = wq − Ca wa ,
d[n]
wqH
u[n] y[n]
M
x[n]
CHa waH
(M − K) × M
M −K
wa,opt = R−1 H −1 H
xx px = (Ca RCa ) Ca Rwq
and
wopt = (1 − Ca (CHa RCa )−1 CHa R)C(CH C)−1 g.
Of course, this quadratic optimization problem with multiple linear constraints could be solved
with the Lagrangian approach, which we have used in the single constraint case (LCMV). The
solution reads
wopt = R−1 C(CH R−1 C)−1 g,
which is identical to the aforementioned one.
Remark: Although not obvious the identity can be shown by solving an equivalent problem
formulated in a transformed variable z with
w = Az
and
1
A = VΛ− 2 ,
where V and Λ can be obtained from an EVD of R = VΛVH . The approach here with wq and
wa , although more involved, gives more insight into the structure of the solution.
∂J(w, w∗ )
= −p + Rw
∂w∗
points into the direction of steepest ascent of the cost function. The minimum of J we have already
computed previously to
∗
Jmin = J(wopt , wopt ) = σd2 − pH R−1 p. (3.1)
Starting from an arbitrary initial value for filter vector w[0] we try to approach this minimum
by incrementing w[0] with a step in the direction of steepest descent, i.e. in the direction of the
negative gradient.
With that, we arrive at
This last equation can be viewed as a linear discrete-time state-space system, where w[n] is the
state vector with constant excitation µp. Let us first transform this system to a homogeneous one
by shifting the fixed point to the origin
Next we will diagonalize the homogeneous discrete-time state-space system by first computing
the EVD of R = QΛQH and
J(w, w∗ )
Error Surface
Jmin
w2,opt
w2
w1,opt
∂J
w[0] − ∂w ∗ |w=w[0]
w1
Fig. 3.3. Parabolic error surface
2
|1 − µλk | < 1 ⇒ 0 < µ <
λk
2
must hold and therefore 0 < µ < λmax .
This is not a very useful criterion, because it would necessitate an EVD of R. Reminding
that the motivation for the iterative solution was reducing complexity in finding a solution for the
normal equation it is not a good idea to perform an even more involved EVD instead. But we can
make use of the following relation:
M
X
λmax ≤ λk = tr(R)
k=1
and come up with a more stringent but simple to compute upper limit for the stepsize guaranteeing
convergence
2 2
0<µ< = .
tr(R) M r(0)
3.3 Iterative Solution of the Normal Equation 49
tr(R) = 4,
λ1 = 3, λ2 = 1,
T
w [0] = [2, 8]
T
wopt = [2, 1]
We have chosen to stop the iterations, if the norm of the gradient falls below 10−5 :
Fig. 3.4 shows the trajectory of w[n] for a small stepsize µ = 0.1. The upper bound for µ is
2 2
λmax
= 0.667 and the more stringent but easier to compute upper bound is tr(R) = 0.5. We see that
it takes 70 steps to converge from the initial value to the optimum one within the given residual
error. Let us next choose the stepsize more aggressive as µ = 0.6 still below the true upper bound.
Now we need only 38 steps to converge Fig. 3.5. In Fig. 3.6, we have the trajectory for µ = 0.5
and we need only 13 steps.
Contours
8 22
4
80
12
6 8
128
32
4 16
80
32
80
2 4
8
32
8
16
w2
16
0
4
22
12
8
4
80 32
−2 16
32
−4
22 12
4 8 80
−6 80
128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.4. Small Stepsize (µ = 0.1): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 70.
50 3. Adaptive Filters
Contours
8
80
80
6
12
128
32 8
4 16
32
80
80
2
32
8
16
w2
0 16 4
12
8
22
8
32
16
4
−2
80
32
−4
12
22 8
4 80
−6
128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.5. Stepsize chosen close to the Upper Bound (µ = 0.6): Trajectory from w[0] toward the solution.
Notice the oscillations due to the choice of the stepsize. The number of iterations required for convergence
is 38.
Contours
8 80
128
12
6 8
80
32
80
4 16
32
8
2 4
32
16
12
8
8
w2
0
80
16 4
16
8
−2
32
22
32
4
12
−4 8 80
−6 80
22
4 128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.6. Moderate stepsize (µ = 0.5): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 13, which is smaller than the previous two examples.
3.3 Iterative Solution of the Normal Equation 51
Now we try to optimize the step size for every iteration step. For this we calculate the error
function at step n.
J[n + 1] = J[n] − ∆wH [n]p − pH ∆w[n] + ∆wH [n]R∆w[n] + wH [n]R∆w[n] + ∆wH [n]Rw[n]
= J[n] − µ∗ (pH − wH [n]R)p − pH µ(p − Rw[n]) + µ∗ (pH − wH [n]R)Rµ(p − Rw[n])
+wH [n]Rµ(p − Rw[n]) + µ∗ (pH − wH [n]R)Rw[n].
r[n] and kr[n]k22 = rH [n]r[n] have to be computed anyway, because they are the negative gradient
and the stopping criterion. The extra burden for optimizing the stepsize therefore is the quadratic
form in the denominator. Fig. 3.7 shows the trajectory for this case with 8 iteration steps.
Contours
8 80
128
12
8
6
32
80
4 16
80
32
2
32
4
12
8
16
8
w2
0 4
16
80 8
−2 32 16
22
32
4
12
−4 8
80
−6 80
22 128
4
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.7. Optimum stepsize (µopt [n]): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 8.
Finally we see in Fig. 3.8 a situation with a larger spread of eigenvalues (λmax = 3, λmin = 0.1).
The constant error ellipses become slimmer and the number of iteration increases.
52 3. Adaptive Filters
Contours
8 16 32 80
6
8
16
32 12
8
4 16 32 80
4
8
8
4
2
12 16
8
80 32
w2
0 16 32
8
22 4
4
4
8
−2
12 16
8
80 32
−4
8
22
−6
4 16
12
8 32
80
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.8. Moderate Stepsize (µ = 0.5): Trajectory from w[0] toward the solution with a large spread of
eigenvalues (λmax = 3, λmin = 0.1). The number of iterations required for convergence is 99.
3.4 Least Mean Square Algorithm (LMS) 53
This shows that we simply have to multiply the actual input vector u[n] with the actual complex
conjugate error e∗ [n] = (d[n] − wH [n]u[n])∗ = d∗ [n] − y ∗ [n] and use this as the update for the
next step. Fig. 3.9 shows a block diagram implementing both the filtering or equalization process
computing the actual y[n] and the adaptation or updating of the filter vector w[n].
54
y[n]
···
3. Adaptive Filters
w0∗ [n] w1∗ [n] w2∗ [n] ∗
wM −1 [n]
∗
wM [n]
u[n]
T T T ··· T
T T T ··· T T
In Section 3.2 we were dealing with the beamforming problem, i.e. spatial filtering based on the
knowledge of direction of desired as well as some undesired wavefronts. Now we will address the
problem of how to get this knowledge with so called subspace based high resolution techniques.
Such techniques are not limited by the aperture of the antenna array.
First we introduce two assumptions, which are important for the derivation of rather simple
models and algorithms. We start with the far-field data model, which leads to planar wavefronts
impinging on the array, see Fig. 4.1.
s1 (t) sd (t)
∆·sin θd
τ (θd ) = c
∆·sin θ1
θ1 c
= τ (θ1 )
···
x1 (t) x2 (t) xM −1 (t) θd
∆ ∆ ∆ ∆
Fig. 4.1. Uniform Linear Array (ULA) with M elements with spacing ∆ and d impinging planar wavefronts
with angles of arrivals θ1 up to θd .
The second important assumption is the narrow band data model. Here we assume, the time τ
which it takes for a wavefront to propagate along the array from the first to the last sensor is very
small compared to the symbol period of the data modulated onto the wavefront.
Here fc is the carrier frequency of the modulated radio signal and si (t) is the complex envelope.
55
56 4. High Resolution Direction-of-Arrival (DoA) Estimation
Let us choose the first sensor as reference sensor. Denoting c as the speed of propagation:
τ1 (θi ) = 0
∆ · sin θi
τ2 (θi ) =
c
..
.
∆ · sin θi
τM (θi ) = (M − 1) .
c
2πfc 2π
µi = − ∆ sin θi = | − ∆ sin θi = | −π · sin θi
c λ= fc λ ∆= λ
c 2
Here a0 (µi ) is the antenna pattern of the identical sensors, which we will assume to be omnidirec-
tional, i.e. a0 (µi ) = 1. This leads us to the array steering matrix of a ULA
1 1 ··· 1
ejµ1 ejµ2 ··· ejµd
ej2µ1 j2µ2
··· j2µd
e e ∈ CM ×d .
AULA =
.. .. . .. .
..
. .
j(M −1)µ1 j(M −1)µ2 j(M −1)µd
e e ··· e
With this Vandermonde matrix AULA we can write our data model as
with λ1 ≥ λ2 ≥ · · · λd > λd+1 = λd+2 = · · · = λM = 0 and uk being the the eigenvectors of Rxx
corresponding to the d nonzero eigenvalues.
The columnspace of the array steering matrix can be expressed as
image{A} = image{US } = S,
because
ARss AH = US Λd UHS .
The nullspace of AH , which is the same as the noise subspace or left nullspace of A, reads
kernel{AH } = kernel{UHS } = N = image{UO }
Now taking the noise into account we have
H
Λd 0 US
Rxx = US UO · + σn2 · 1M ·
0 0 UHO
M
X
= pk uk uHk ,
k=1
with pk = λk + σn2 .
But because we do not know the true correlation matrix we have to estimate the signal subspace
from received snapshots of data
X = [x(t1 ), x(t2 ), · · · , x(tN )] ∈ CM ×N , N snapshots.
= A · [s(t1 ), s(t2 ), · · · , s(tN )] + [n(t1 ), n(t2 ), · · · , n(tN )]
= AS + N,
58 4. High Resolution Direction-of-Arrival (DoA) Estimation
where we have assumed that our scenario does not change during the time from t1 to tN and A
therefore stays constant.
N
1 X 1
R̂xx = x(tn )xH (tn ) = XXH
N n=1 N
is the estimate of the correlation matrix obtained from the received samples. We can either compute
an EVD of R̂xx or an SVD of X
H
H
Σs 0 Vs
X = UΣV = Us Uo · ·
0 Σo VH
Ho
H
Λs 0 Us
R̂xx = UΛU = Us Uo · ·
0 Λo UHo
ka(θ)k22
SMUSIC (θ) =
kaH Uo k22
aH (θ)a(θ)
= H
a (θ)Uo UHo a(θ)
aH (θ)a(θ)
= H ,
a (θ)Po a(θ)
100
90
80
70
(θ)
MUSIC 60
50
S
40
30
20
10
0
−1.5 −1 −0.5 0 0.5 1 1.5
Angle θ in Radians
Fig. 4.2. MUSIC Spectrum with M = 8 antennas, d = 3 impinging wavefronts and N = 100 snapshots
with SNR = 0 dB.
Non−overlapping subarrays
∆ · sin θ
Impinging
3
Wavefront θ
∆ 2
Subarray 2
6
4
1
5
Subarray 1
Doublets
Fig. 4.3. Antenna Array with M = 6 antennas 3 non-overlapping doublets and a translational shift invari-
ance structure. Different shapes for each antenna (triangles, squares and circles) represent different antenna
patterns.
4.3 The Standard ESPRIT Algorithm 61
Overlapping Subarrays
Subarray 2
∆
3
2
1
4
5
Doublets Subarray 1
Fig. 4.4. Antenna Array with M = 5 antennas and 3 overlapping doublets and a translational shift invari-
ance structure. Different shapes for each antenna (squares and circles) represent different antenna patterns.
The overall number of antenna elements is 2m = M = 6 in the example of Fig. 4.3. Obviously,
we can be more efficient by working with overlapping subarrays as it is depicted in Fig. 4.4.
Furthermore, this is especially the case with a ULA as shown in Fig. 4.5.
Subarray 2
···
x1 x2 x3 xm xM
Subarray 1
Fig. 4.5. Revealing the translational shift invariant subarray structure of a ULA with M = m + 1.
The receive signal vector of such an antenna array can be split into two vectors with the aid of
two selection matrices
1 0 0 ... 0
0 1 0 ... 0
J1 = .. .. . . . . .. = [ 1m 0 ],
. . . . .
0 0 0 1 0
0 1 0 0 ...
0 0 1 0 ...
J2 = .. .. .. . . . . = [ 0 1m ].
. . . . .
0 0 0 0 1
62 4. High Resolution Direction-of-Arrival (DoA) Estimation
Starting from
1 1 ··· 1
φ1 φ2 ···
φd
φ21 φ22 ···
φ2d
x(t) = .. .. .. · s(t) + n(t), φi = ejµi
..
m−1 m−1 . . .
.
m−1
φ1 φ2 · · · φd
m m m
φ1 φ2 · · · φd
1 1 ··· 1
φ1
φ2 · · · φd
φ2 φ 2
· · · φ 2 ′ ′
x1 (t) = J1 x(t) = 1 2 d · s(t) + J1 n(t) = A s(t) + n (t)
.. .. .. ..
. . . .
m−1 m−1 m−1
φ1 φ2 · · · φd
φ1 φ2 · · · φd
φ2 φ22 · · · φ2d
1
x2 (t) = J2 x(t) = ... .. .. .. · s(t) + J n(t) = A′′ s(t) + n′′ (t).
m−1 m−1 . . .
2
m−1
φ1 φ2 · · · φd
φm1 φ m
2 · · · φm d
′′ ′
We can see that A = A Φ with Φ = diag{φi }di=1 . Furthermore,
′ ′′
J1 A = A , J2 A = A ⇒ J1 AΦ = J2 A.
Since image{A} = image{US } = S it follows that there exists a nonsingular TA such that
A = US TA : J1 US TA Φ = J2 US TA and
J1 US TA ΦT−1 −1
A = J2 US and with TA ΦTA = Ψ
J1 U S Ψ = J2 U S .
Ψ = TA ΦT−1A
λ
and with µi = arg φi and θi = arcsin − 2π∆ µi we have estimates for the DoA’s.
4.4 Unitary ESPRIT: Real Valued Subspace Estimation 63
Let us now summarize the three steps of the Standard ESPRIT algorithm:
1) Signal Subspace Estimation: Compute Us ∈ CM ×d either via the
• Square root approach or direct data approach: From the SVD of
H
Σs 0 Vs
X = Us Uo · · ∈ CM ×N , N > M,
0 Σo VoH
J U Ψ ≈ J2 U s ,
| 1{z }s | {z }
Cm×d Cm×d
by means of Least Squares (LS), Total Least Squares (TLS) or Structured Least Squares (SLS).
3) Spatial Frequency (DoA) Estimation: Compute the eigenvalues of Ψ
Ψ = TA ΦT−1
A , Φ = diag{φi }di=1
λ
µi = arg φi , θi = arcsin − µi .
2π∆
Therefore, we have a closed form solution for the problem at hand without any numerical search
like in spectral MUSIC.
M ∈ Cp×q : Πp · M∗ · Πq = M
Πp · M · Πq = M ∗ ,
since when multiplying a given matrix A with Π from the left we are exchanging the rows of the
A. Additionally, if we multiply A with Π from the right we are exchanging the columns of the
matrix. Furthermore, note that Π−1
p = Πp .
64 4. High Resolution Direction-of-Arrival (DoA) Estimation
Q ∈ Cp×q : Πp · Q∗ = Q
Πp · Q = Q ∗ ,
holds.
Theorem 4.4.1. Let Qp and Qq being left Π-real, square and nonsingular:
φ : M → Q−1
p MQq = R = φ(M).
= Q−1
p MQq
= φ(M ) = R ∈ Rp×q
= Qp RQ−1
q
= φ−1 (R) = M. q.e.d.
Assume now, Qp and Qq are not only left Π-real, square and non-singular, but also unitary,
then a real-valued SVD results in
SVD
φ(M) = R = QHp MQq = EΣφ FH ,
U = Qp E
Σ = Σφ
V = Qq F.
• for p, q even:
1 1n j · 1n
Q2n =√ · ,
2 Πn −j · Πn
• for p, q odd:
1n √0 j · 1n
1
Q2n+1 = √ · 0T 2 0T ,
2 Πn 0 −j · Πn
which both are left Π-real, square, nonsingular and unitary!
Next, let us transform our array measurement matrix X ∈ CM ×N into a centro-Hermitian
matrix Z.
Z = X, ΠM X∗ ΠN ∈ CM ×2N .
Checking now, if it is centro-Hermitian:
∗
∗ 0 ΠN
ΠM Z Π2N = ΠM X , ΠM XΠN
ΠN 0
∗
0 ΠN
= ΠM X , XΠN ·
ΠN 0
= X, ΠM X∗ ΠN = Z.
Now, we apply our mapping φ onto the centro-Hermitian array measurement Z.
φ(Z) = φ( X, ΠM X∗ ΠN )
= QHM X, ΠM X∗ ΠN Q2N
= T (X)
T (X) ∈ RM ×2N .
Assume now M an even number, i.e. M = 2n and let us partition X into two submatrices
X1
X= ∈ C2n×N ,
X2
with X1 ∈ Cn×N and X2 ∈ Cn×N . Now use our specific choice of Qp and Qq :
T (X) = QHM X, ΠM X∗ ΠN Q2N
∗
1 1n Πn X1 0 Πn X1 1 1n j1n
= √ · , · · ΠN · √
2 −j1n jΠn X2 Πn 0 X∗2 2 Πn −jΠn
∗
1 1n Πn X1 Π n X2 Π N 1n j1n
= · ∗ · .
2 −j1n jΠn X2 Π n X1 Π N Πn −jΠn
Finally, we arrive at
1 X1 + Π n X2 , Πn X∗2 ΠN + X∗1 ΠN 1n j1n
T (X) = ·
2 −jX1 + jΠn X2 , −jΠn X∗2 ΠN + jX∗1 ΠN Πn −jΠn
1 X1 + Πn X2 + Πn X∗2 + X∗1 , jX1 + jΠn X2 − jΠn X∗2 − jX∗1
=
2 −jX1 + jΠn X2 − jΠn X∗2 + jX∗1 , X1 − Πn X2 − Πn X∗2 + X∗1
Re {X1 + Πn X∗2 } , −Im {X1 − Πn X∗2 }
= ∈ RM ×2N .
Im {X1 + Πn X∗2 } , Re {X1 − Πn X∗2 }
66 4. High Resolution Direction-of-Arrival (DoA) Estimation
Therefore, after a lengthy derivation we have arrived at a simple and beautiful result: we have
a real-valued matrix, which we obtain from the complex-valued measurement by really simple
calculations (no actual multiplication needed!). Now we can perform the computationally simpler
real-valued SVD of T (X) and relate it to the complex valued SVD of Z:
T (X) = φ(Z) = E · Σ · FH
Z = U · Σ · VH = QM · E · Σ · FH · QH2N
H
Σs 0 Fs
= Q M · Es , Eo · · · QH2N
0 Σo FHo
H
Σs 0 Vs
= Us , Uo · ·
0 Σo VoH
⇒ Us = QM Es or Es = QHM Us .
X = AS + N = Us Σs VsH + Uo Σo VoH
J1 AΦ ≈ J2 A,
image{X} = image{Us } = image{A},
therefore,
A = U s · TA ,
where TA is square and nonsingular. Hence,
J1 U s T A Φ ≈ J2 U s TA
J1 Us Ψ ≈ J2 U s ,
where
Ψ = TA ΦT−1
A .
J1 = 1m , 0
J2 = 0, 1m
⇒ J1 = Π m J2 Π M .
4.4 Unitary ESPRIT: Real Valued Subspace Estimation 67
1
QHm J2 QM = (K1 + jK2 ) ,
2
then it follows that
1
QHm J1 QM = (K1 − jK2 ) ,
2
with
K1 = QHm (J1 + J2 ) QM = 2Re QHm J2 QM ∈ Rm×M
1 H
K2 = Qm (J2 − J1 ) QM = 2Im QHm J2 QM ∈ Rm×M .
j
Therefore,
D = QHM Us TA = Es TA
K 1 E s TA Ω ≈ K 2 E s TA ,
68 4. High Resolution Direction-of-Arrival (DoA) Estimation
and therefore
K1 Es (TA ΩT−1 ) ≈ K 2 Es
| {z A }
Υ
K 1 Es Υ ≈ K 2 Es ,
which is the real-valued Invariance Equation!
Look at a simple example how the new selection matrices, which can be computed off-line
look like:
Example 4.4.2. Uniform Linear Array with M = 6 antennas, overlapping sub-arrays with m = 5
antennas each.
1 0 0 0 0 0
0 1 0 0 0 0
J1 = 0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 1 0 0 0 0
0 0 1 0 0 0
J2 = 0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
K1 = QHm · (J1 + J2 ) · QM
1 0 0 j 0 0
1 0 0 0 1 1 1 0 0 0 0
0 1
0 1 0 0 j 0
√0 1 0
0 1 1 0 0 0
1 1 0 0 1 0 0 j
= √ 0 0
2 0 0 · 0 0 1 1 0 0 · √
2 −j 0
0 0 0
2 0 0 1 0 0 −j
0 0 j 1 1 0
0 1 0 0 −j 0
0 −j 0 j 0 0 0 0 0 1 1
1 0 0 −j 0 0
1 1 0 0 0 0
0 1
√1 0 0 0
=
0 0 2 0 0 0 ,
0 0 0 1 1 0
0 0 0 0 1 1
and
1 H
K2 = Q · (J2 − J1 ) · QM
j m
0 0 0 −1 1 0
0
0 0 0 −1 √1
=
0 0 0 0 0 2 .
1 −1 0 0 0 0
0 1 −1 0 0 0
4.4 Unitary ESPRIT: Real Valued Subspace Estimation 69
There is still one question open: is the columnspace spanned by X and the one spanned by
Z the same? Because only then our transformation of the array measurement X to the centro-
Hermitian matrix Z will not change the estimated directions of arrival (DoA’s).
To assure that, we have to use centro-symmetric arrays, which are characterized by the follow-
ing equation:
ΠM A∗ = A∆,
where A is the steering matrix and ∆ being a diagonal and unitary matrix.
For a ULA this holds, as we will see:
with
d
∆ULA = Φ−(M −1) = diag e−j(M −1)µi i=1
.
Just as for the Standard ESPRIT, let us now summarize the three steps for Unitary ESPRIT
1) Signal Subspace Estimation: Compute Es ∈ RM ×d either via the
• Square root approach or direct data approach: From the SVD of
H
Σs 0 Fs
T (X) = Es Eo · · ∈ RM ×2N , N > M,
0 Σo FHo
by taking the the d dominant left singular vectors of T (X). or via the
• Covariance Approach: From the EVD of
H
H Λs 0 Es
T (X) (T (X)) = Es Eo · · ∈ RM ×M ,
0 Λo EHo
K 1 Es Υ ≈ K 2 Es ,
| {z } | {z }
Rm×d Rm×d
by means of Least Squares (LS), Total Least Squares (TLS) or Structured Least Squares (SLS).
3) Spatial Frequency (DoA) Estimation: Compute the eigenvalues of the resulting real-valued
solution Υ
Υ = TA ΩT−1
A ∈ R
d×d
, with Ω = diag{ωi }di=1 .
Afterward
• Reliability Test: If all eigenvalues ωi are real, the estimates will be reliable. Otherwise,
start again with more measurements.
• If all eigenvalues ωi are real then
µi = 2 arctan ωi
λ
θi = arcsin − µi .
2π∆
5. Signal Reconstruction
In many cases not the DoA’s but the wavefronts, i.e. their complex envelopes (signals), are of
interest. Consider the case when we obtain an estimate of the signals or wavefronts by applying a
linear filter WH ∈ Cd×M on the received data X, i.e. the estimate of the signals is given by
Ŝ = [ŝ(t1 ), ŝ(t2 ), · · · , ŝ(tN )] = WH · X,
and let us recall that the data model is
X = AS + N,
where A now is assumed to be known from the DoA estimation! To this end, let us now consider
different alternatives for computing the receive filter WH :
1) LS Solution (Least Squares)
2) MVDR Solution (Minimum Variance, Distortionless Response),
3) MMSE Solution (Minimum Mean Square Error),
4) MF Solution (Matched Filter).
5.1 LS Solution
We assume no statistical knowledge about S and N and that
X ≈ AS,
hence
Ŝ = argmin kX − ASk2F .
S
The solution to this optimization problem is obtained with a pseudo-inverse
AH X = AH AŜ
(AH A)−1 AH X = (AH A)−1 AH AŜ
Ŝ = (AH A)−1 AH X = A+ X
= WH X.
Therefore, the linear reconstruction filter optimal in the aforementioned sense is with
WLSH
= A+ ,
H
WLS A = 1d .
71
72 5. Signal Reconstruction
Ŝ = WH (AS + N)
= S + WH N,
i.e. we desire a distortionless response leading to an unbiased estimate, but additionally we would
also like to minimize the total output power since that also minimizes the noise power (variance),
hence the name minimum variance distortionless response. We consider the d individual outputs
of our reconstruction filter by partitioning the filter matrix into beamforming vectors
w1H
w2H
∈ Cd×M .
H
WMVDR = ..
.
wdH
where ei ∈ {0, 1}d is an all-zero vector with a 1 only at the i-th entry.
Thus, we have
Therefore, we can find the MVDR solution from either one of two optimization problems
(∗) wi,MVDR = argmin E |wiH n[n]|2 = argmin wiH Rnn wi
wi wi
where
Rxx = ARss AH + Rnn .
5.3 MMSE Solution 73
MVDR is a zero forcing (ZF) solution, completely suppressing interference between impinging
wavefronts and minimizing the noise.
Example 5.2.1. Special case (i.i.d. noise):
Rnn = σn2 · 1M
Then the MVDR solution is
−1
H H 1 1
WMVDR = A 2 1M A AH 2 1 M
σn σn
H
−1 H
= A A A
+
= A
H
= WLS .
Hence if the noise is i.i.d. ⇒ MVDR solution and LS solution are identical!
for each i = 1, · · · , d. For all i = 1, · · · , d, we can collect each of the previous expressions to
obtain
Rxx W = Rxs ,
where
Rxs = [p1 , p2 , · · · , pd ] ∈ CM ×d ,
W = [w1 , w2 , · · · , wd ] ∈ CM ×d .
Hence,
WMMSE = R−1
xx Rxs .
Note that
Rxs = E x[n]sH [n] = E (As[n] + n[n]) · sH [n]
= AE s[n]sH [n] = ARss .
In the high SNR regime, the MMSE solution converges to the MVDR solution.
5.4 MF Solution
ŝi = wiH xi ,
xi = a(µi )si + n
ŝ = wiH a(µi )si + wiH n,
2 i ∗
E |ŝi | = E wiH a(µi )si + wiH n wiH a(µi )si + wiH n
= wiH a(µi )aH (µi )wi E |si |2 + wiH Rnn wi
wiH a(µi )aH (µi )wi E [|si |2 ]
SNRi = .
wiH Rnn wi
If
Rnn = σn2 · I
E |si |2 = σs2 ,
5.4 MF Solution 75
Hence,
wi,MF = a(µi ),
so
W H = AH .
Again, we see that in the low SNR regime (i.e. σs2 ≪ σn2 ) the Wiener solution (MMSE)
converges to the matched filter solution, while in the high SNR regime it converges to the MVDR
(ZF) solution.
6. Downlink Beamforming
Consider the downlink of a single cell with M transmit antennas at the base station and with K
single-antenna users as depicted in Fig. 6.1. We denote sk (t) as the signal of user k at time instance
t. Additionally, we assume that there are Qk number of paths from the base station to user k. The
spatial frequency, the gain and the delay of path qk of user k are denoted by µk,q , bk,q and τk,q .
Let us assume that the base station knows the angles of arrivals of the users after performing
angle of arrival estimation in the uplink, through MUSIC or ESPRIT for instance. Based on reci-
procity we can assume that the angles of departure θ from the base station to the users in the down-
link are the same as the angles of arrival. Through downlink beamforming, the beamforming vector
wk ∈ CM is employed to transmit to user k. In the following we employ ak,q = a(µk,q ) = a(θk,q )
for the array steering vector of the angle of departure θk,q .
From Fig. 6.1, we have that the received signal xk [n] of user k at time slot n, is given by
Qk K
!
X X
xk [n] = wlH sl (nT − τk,q ) · bk,q · ak,q + nk [n] + ik [n]
q=1 l=1
K Qk
X X
= wlH sl (nT − τk,q ) · bk,q · ak,q + nk [n] + ik [n],
l=1 q=1
where nk [n] is the noise at user k at time slot n and ik [n] is the intercell interference at user k at
time slot n. We assume that sk [n], nk [n], and ik [n] are mutually uncorrelated. Additionally, we
denote the power of the noise and the interference at user k as Nk and Ik , respectively, i.e.
E |nk [n]|2 = Nk
E |ik [n]|2 = Ik [n].
Therefore,
! !H
K Qk K Qk
X X X X
2
E |xk [n]| = E wlH sl (nT − τk,q )bk,q1 ak,q1 whH sh (nT − τk,q2 )bk,q2 ak,q2
l=1 q1 =1 h=1 q2 =1
+ Nk + Ik [n].
76
1 1
2 2
s1 [t] w1∗ .. .. 1 µk,1 , bk,1 , τk,1
. . Σ1
M K
θk,1
1 1 User k
2 2
s2 [t] w2∗ .. .. 2
. . Σ2
µk,2 , bk,2 , τk,2
M K θk,2
xk [n]
..
.. .. ..
6. Downlink Beamforming
θk,Qk µk,Qk , bk,Qk , τk,Qk
1 1
2 2
sK [t] ∗
wK .. .. M
. . ΣM
M K
77
78 6. Downlink Beamforming
P PQk P K PQk
In the double sums K l=1 q1 =1 and h=1 q2 =1 , only the terms with l = h contribute to the
expectation, because sl and sh l 6= h (signals bound for different users) are uncorrelated:
K
" Qk Qk
#
X X X
2
E |xk [n]| = wlH E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 ak,q1 aHk,q2 wl + Nk + Ik [n]
l=1 q1 =1 q2 =1
K
X
= wlH Rkl wl + Nk + Ik [n],
l=1
with
Qk Qk
X X
Rkl = aHk,q2 ak,q1 · E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 .
q1 =1 q2 =1
Note that,
0 6 q2
for q1 =
E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 = ,
σl2 · E [|bk,q |2 ] for q1 = q2 = q
since distinct path gains for a given user are uncorrelated. We assume that σl2 = 1 ∀ l = 1, . . . , K.
Then,
Qk
X
Rkl = Rk = ak,q aHk,q · E |bk,q |2 ,
q=1
i.e., it is independent of l.
The SINR for user k reads as
wkH Rk wk
SINRk = K
.
X
wlH Rk wl + Ik + Nk
|{z}
l=1,l6=k |{z}
| {z } Intercell- Noise
Intracell-
Interference
Note that the transmit power assigned to user k is given by Pk = kwk k22 , since we have assumed
σk2 = 1.
The problem that we will consider in order to compute theP beamforming vectors wl for l =
1, . . . , K would be to minimize the total transmit power PT = K l=1 Pk subject to guaranteeing a
specified SINRk for every user k.
Therefore, we consider
K K
X X wkH Rk wk
min Pk = min kwk k22 s.t. SINRk = PK k = 1, . . . , K,
wk
k=1
wk
k=1 l=1,l6=k wlH Rk wl + Ik + Nk
i.e. we have K quadratic constraints
K
!
X
SINRk · wlH Rk wl + Ik + Nk = wkH Rk wk k = 1, . . . , K.
l=1,l6=k
max wkH Rk wk ∀ k = 1, . . . , K.
wk
The solution of this problem is obvious1 : the eigenvector of Rk , which corresponds to the largest
eigenvalue of Rk will maximize the given quadratic form.
Let us designate this eigenvector by uk , with kuk k22 = 1.
Hence we have now a problem with K linear equations and K unknows, which we can solve as
follows:
Ψ·P = 1
P = Ψ−1 · 1 ∈ RK
+.
However, note that only if all components of P are positive, this is a valid solution! In addition, if
PTmax is the maximum available transmit power, only if
K
X
Pl = kPk1 ≤ PTmax ,
l=1
this solution can be implemented! If one of the two conditions is not fulfilled, then at least
one
PK−1 of the K users has to be taken out. If the reduced problem then has a valid solution and
l=1 Pl < PTmax , then the scheduling algorithm can try to another user, if there are some
waiting for service.
Appendix
A1 Subspaces of a Matrix
Rn Rm
xr xr → Axr = Ax Ax
row space column space
R(AH ) R(A)
x → Ax
x
0 0
81
82 Appendix
Rm Rn
p = Pb x + = A+ p = A+ b x + = A+ b
column space row space
R(A) R(AH )
A+ b = p
b
0 0
Fig. A2. The action of the pseudoinverse matrix A+ . Figure taken from [2].
Bibliography
83