aasp15book

Adaptive and Array Signal Processing
Univ.-Prof. Dr. techn. Dr. h. c. Josef A. Nossek
October 12, 2015
Technische Universität München

Institute for Circuit Theory and Signal Processing
Univ.-Prof. Dr. techn. Josef A. Nossek
2nd Edition 2015
Adaptive and Array Signal Processing by Technischen Universität München is licensed under the
Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/.
Contact: Josef.A.Nossek@tum.de
Editor: Univ.-Prof. Dr. techn. Josef A. Nossek,
Institute for Circuit Theory and Signal Processing, Technische Universität München
Internal reference number: TUM-LNS-TR-15-05
Print: Fachschaft Elektrotechnik und Informationstechnik e.V., München
Contents
1. Introduction and Motivation 1

1.1 Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Inverse Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Interference Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Adaptive Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Single Channel Adaptive Equalization . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Multichannel Adaptive Beamforming . . . . . . . . . . . . . . . . . . . . . . . . 10
2. Mathematical Background 14
2.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Differentiation with respect to a Complex Vector . . . . . . . . . . . . . . . . . . 15
2.3 Quadratic Optimization with Linear Constraints . . . . . . . . . . . . . . . . . . . 17
2.4 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Characterization: Mean, Autocorrelation, Autocovariance, Variance . . . . 18
2.4.1.1 Time averages (averages along the process) . . . . . . . . . . . . 19
2.4.2 Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Eigenfilter (Generalized Matched Filter) . . . . . . . . . . . . . . . . . . . . . . . 32
3. Adaptive Filters 34
3.1 Linear Optimum Filtering (Wiener Filters) . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Minimum Variance Distortionless Response (MVDR) Beamforming . . . . 43
3.2.2 Generalized Sidelobe Canceller (GSC) . . . . . . . . . . . . . . . . . . . 44
3.3 Iterative Solution of the Normal Equation . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Least Mean Square Algorithm (LMS) . . . . . . . . . . . . . . . . . . . . . . . . 53
4. High Resolution Direction-of-Arrival (DoA) Estimation 55

4.1 Subspace Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 MUltiple SIgnal Classification (MUSIC) . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 The Standard ESPRIT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Unitary ESPRIT: Real Valued Subspace Estimation . . . . . . . . . . . . . . . . . 63
iii
iv Contents
4.4.1 Centro-Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4.2 Left Π-real Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.3 Real Valued Invariance Equation . . . . . . . . . . . . . . . . . . . . . . . 66
5. Signal Reconstruction 71
5.1 LS Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 MVDR Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 MMSE Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 MF Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6. Downlink Beamforming 76
6.1 First Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Second Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Appendix 81
A1 Subspaces of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 83
1. Introduction and Motivation
A filter is an electronic circuit used to process signals, with the goal of removing undesired com-
ponents in the signal and/or enhancing desired components in the signal. Technically speaking,
filtering refers to the extraction of information about a desired quantity at time t by using data
measured up to and including time t [1]. A filter can be classified according to its characteristics
as either:
• passive or active,
• analog or digital,
• discrete-time (sampled) or continuous-time,
• linear or non-linear and
• infinite impulse response (IIR) or finite impulse response (FIR).
In this course we will focus on digital filters which are discrete-time by nature. A digital filter
works by performing discrete mathematical operations on a sampled version of the signal.
We can further classify a digital filter as either a fixed filter or an adaptive filter. As the name
implies, the parameters (coefficients) of a fixed filter cannot be adapted depending on the input
signal. This is a viable approach if the statistics of the signal to be processed are known a priori.
However, when this information is not known completely, an optimum fixed filter cannot be de-
signed beforehand. In this scenario of unknown statistics, an adaptive filter can be employed since
the filter adapts to the given environment, being able to filter a desired signal from the input signal
with unknown statistics. The use of an adaptive filter offers an attractive solution to the filtering
problem as it usually provides a significant improvement in performance over the use of a fixed
filter designed by conventional methods [1]. At each iteration, the parameters of the filter are up-
dated according to the filter parameters in the previous iteration, the input signal and some further
information depending on the specific filter. If the scenario is stationary, successive iterations of
such an algorithm converge to an optimum solution. In a non-stationary scenario, the algorithm is
to some extent able to the track the time variations of the statistics of the input signal, under the
assumption that the variations are sufficiently slow. As the name of this course implies, we will be
specifically dealing with adaptive digital filters.
In general, an adaptive filter can be represented as depicted in Fig. 1, where u[n], y[n], d[n]
and e[n] represent the discrete-time input signal, the discrete-time output signal, the desired signal
and the error, respectively. The error e[n] denotes the difference between the actual output and the
desired output. In addition, we have that w[n] is the weight vector which holds the set of the filter
parameters that can be adapted to drive the error to zero. Furthermore, we have that the current
state of the adaptive filter is denoted by x[n] and is stored in the memory of the filter. These signals
can be real or complex valued scalars or vectors. However, the vectors y[n], d[n] and e[n] must
1
2 1. Introduction and Motivation
have the same dimensions in order that the summation shown in Fig. 1 is consistent. The signals
can be summarized as follows:
• u[n] = input signal of the adaptive filter
• y[n] = output signal of the adaptive filter
• d[n] = desired or reference signal for the output
• e[n] = error between actual output and desired signal,
with w[n] as the set of filter coefficients at time instant n.
Notation: The notation that will be employed in this script will be the following. The vectors
will be represented as small boldface letters; meanwhile, the matrices will be given by capital
boldface letters. Since the signals are discrete-time we have that n is an integer.
u[n] y[n] −d[n]

Programmable
Filter
w[n] e[n]
Adaptation
Algorithm
Fig. 1.1. Adaptive Filter
1.1 Application Areas

Due to the satisfactory performance of adaptive filters in a scenario with unknow statistics, adaptive
filters have been successfully applied in diverse fields with different applications [1]. For each
application we have an input vector u[n] and an output vector y[n] with which an estimation error
e[n] with respect to a desired signal d[n] is computed. The computed error is in turn used to
calculate the values of the adjustable parameters or coefficients w[n] of the adaptive filter. The
main difference between the different applications where an adaptive filter can be applied is the
way the reference or desired signal is extracted. In the following we take a look at some examples.
1.1.1 System Identification

In this application, the unknown plant is a system to be identified by the adaptive filter. Prior
knowledge about the inner structure of the plant is of course desirable. Such knowledge would
enable us to choose a proper model structure in our adaptive filter, the parameters of which are
adapted to match the output of the adaptive filter to the output of the plant exciting both with the
same input.
An example of system identification in communications is the problem of channel estimation.
In such a case, the plant is the channel and with system identification we are then attempting to
estimate the channel. The training sequence is given by u in Fig. 1.2, which is known at the
receiver.
1.1 Application Areas 3
u Adaptive
Filter
y
e −
+ d
System Plant System
input output
Fig. 1.2. System Identification
1.1.2 Inverse Modeling

In inverse modeling, the adaptive filter tries to undo the effect of the plant apart of some delay.
The cascade of the plant and the adaptive filter is a distortionless system, i.e. the adaptive filter is
inverting the plant.
Let us take again an example in communications. Once again we have that the plant is the
channel but now by cascading the filter with the plant we are performing inverse modeling for this
plant. This is commonly referred to as adaptive equalization in communications. As depicted in
Fig. 1.3, the training sequence is the system input in this case.
System u Adaptive System

Plant output
input Filter
y
e −
+ d
Delay
Fig. 1.3. Inverse Modeling
1.1.3 Prediction
Now we will take another focus on the application of adaptive filters. Let us assume that we are
interested in providing the best prediction of a given sample based on previous samples. If there
is correlation between the time samples one can use prediction. The predicted signal would be
the system output 2 and the prediction error would be the system output 1 shown in Fig. 1.4. An
example where prediction is done is encoding the speech in GSM.
1.1.4 Interference Cancellation

Another application of adaptive filters is the mitigation of unknown interference which usually
corrupts our desired signal. The parameters of the adaptive filter are chosen such that the cancel-
System
output 2
+ d
Random u Adaptive y e System
Delay Filter output 1
signal −
Fig. 1.4. Prediction
lation of the interference is optimized in some sense. In Fig. 1.5, we depict a general example
for interference cancellation. The plant is now the path, through which the interference leaks into
the primary signal of interest. The adaptive filter, if appropriately adjusted, compensates for this
interference. The interference can also be an echo in a long distance telephone connection. Such
a situation is depicted in Fig. 1.6, the boxes marked N are balancing impedances. Fig. 1.7 gives
more details with an adaptive filter for echo compensation on the site of speaker A. Fig. 1.8 shows
another form of interference cancellation, where the interference may be acoustic noise, which
contaminates a voice signal picked up by a microphone (primary sensor). If we are able to pick up
the noise only with a reference sensor, we could cancel the noise component in the voice signal.
Primary
signal
Plant
+ d
Interference u Adaptive y e System
signal Filter − output 1
Fig. 1.5. Interference Cancellation
Echo of B’s speech Line 1
Hybrid N N Hybrid
A B
Speaker Speaker
A B
Line 2
Echo of A’s speech
Fig. 1.6. Long-distance telephone circuit
1.1 Application Areas 5
Adaptive Speaker
Filter Hybrid
B
r̂[n] Speaker A’s

Speaker B’s
echo r[n]
− + signal x[n]
e[n]
Fig. 1.7. Signal definitions for echo cancellation
Primary
sensor +
Signal
source Output
−
Estimate
of noise
Reference
Noise sensor Adaptive
source Filter
Adaptive Noise Canceller

Fig. 1.8. Adaptive noise cancellation
1.2 Adaptive Equalization

As seen before, adaptive filters can be employed in order to perform adaptive equalization, which
is an important topic in communications via time-variant channels, which we encounter e.g. in
mobile communications. Fig. 1.9 gives an abstract block diagram of such an application.
Synchronization
Input Output
binary binary
data Pulse Transmit Receive Adaptive Decision device
Medium
generator filter filter Equalization device
Noise
Plant, Channel
Transmitter Receiver
1.2 Adaptive Equalization

Fig. 1.9. Block Diagram of a Baseband Data Transmission System with Equalization
7
1.3 Single Channel Adaptive Equalization

Let us now consider the case of single channel temporal adaptive equalization. In Fig. 1.10 we have
a closer look at the adaptive equalizer which is a linear, discrete-time filter with finite duration of its
impulse response (FIR). Other names for these kinds of filters are transversal filter, non-recursive
filter and moving average (MA) filter. As stated before the M filter coefficients are given by
w0 , w1 , . . . , wM −1 . The z −1 represents the delay in the shift registers and the u[n],y[n],e[n] and
d[n] are the input signal, output signal, error estimate and desired signal at time n, respectively.
u[n] u[n − 1] u[n − M + 2] u[n − M + 1]

z −1 z −1 ··· z −1
w0∗ w1∗ ··· ∗

wM −2
∗
wM −1
···
− y[n]
e[n]
d[n]
Fig. 1.10. Temporal Adaptive Equalization
First, we collect the signals present at the inputs of the various stages of the shift register into
an M -dimensional vector:  
u[n]
 u[n − 1] 
 ∈ CM ,
 
u[n] =  .. (1.1)
 . 
u[n − M + 1]
whereas the filter parameters are stacked in
 
w0∗
∗
 w1∗ 
 ∈ CM .
 
w = .. (1.2)
 . 
∗
wM −1
For the filter output we can write

M
X −1
y[n] = wk∗ u[n − k] = wH · u[n], (1.3)
k=0
1.3 Single Channel Adaptive Equalization 9
and its complex conjugate is then given by

y ∗ [n] = uH [n] · w. (1.4)
Let us recall that wH = (wT )∗ = (w∗ )T = [w0∗ , w1∗ , . . . , wM
∗ H
−1 ], where (•) is the Hermitian
operation, which is the complex conjugate and transposition of a matrix or vector.
Next, we collect N + 1 output samples, given by (1.4), from time instant n to instant n + N :
   
y ∗ [n] u∗ [n] u∗ [n − 1] ... u∗ [n − M + 1]
 y ∗ [n + 1]   u∗ [n + 1] u∗ [n] ... u∗ [n − M + 2] 
y∗ [n] = 
   
.. =
  .. .. ... .. ·w
 .   . . . 
∗ ∗ ∗ ∗
y [n + N ] u [n + N ] u [n + N − 1] . . . u [n + N − M + 1]
 
uH [n]
 uH [n + 1] 
 
=  ..  · w = UH · w, (1.5)
 . 
uH [n + N ]
where UH ∈ C(N +1)×M with U = [u[n], u[n + 1], . . . , u[n + N ]]. In addition, we have that
e[n] = d[n] − y[n] and that e∗ [n] = d∗ [n] − y ∗ [n] and let us denote e[n] as a collection of N + 1
error samples  
e[n]
 e[n + 1] 
 ∈ CN +1 .
 
e[n] =  .. (1.6)
 . 
e[n + N ]
Furthermore, let us collect N + 1 samples of the desired signal
 
d[n]
 d[n + 1] 
 ∈ CN +1 .
 
d[n] =  .. (1.7)
 . 
d[n + N ]
The problem to be solve can be posed in the following way:
Find a w such that ke[n]k22 is minimal!
Depending on the value of N +1 (the number of equations, the number of input vectors) and M
(the number of degrees of freedom in our weight vector), we have the following three possibilities:
• N + 1 = M : If N + 1 is equal to M , we can make the error zero, i.e. ke[n]k22 = 0 and
d[n] = y[n], and hence, d∗ [n] = UH · w.
• N + 1 < M : If N + 1 is less than M , we can still make the error zero like the previous case,
i.e. ke[n]k22 = 0. However, if (N + 1) < M , the solution w is not uniquely determined since
we have fewer equations than unknowns and we can impose additional restrictions on w (e.g.
minimum norm kwk) to arrive at a specific solution. For M = N + 1 and rank{UH } = M we
have an unique solution already.
• N + 1 > M : For the usual case of N + 1 > M , we have an overdetermined system of linear
equations and in general, we will not have zero error! Nevertheless, we aim at minimizing
ke[n]k22 !
With e∗ = d∗ − UH w we have that the squared norm of this error vector is

kek22 = eH · e = (e∗ )H · e∗
= (d∗ − UH w)H · (d∗ − UH w)
= (d∗ )H · d∗ − wH Ud∗ − dT UH w + wH UUH w
= kdk22 − wH p − pH w + wH Rw (1.8)
where R = UUH = RH ∈ CM ×M and p = Ud∗ ∈ CM . This is one cost function which we will
minimize by choosing an appropriate w. We denote the cost function as J(w):
J(w) = kek22 = kdk22 − wH p − pH w + wH Rw ∈ R, (1.9)
which is a real valued cost function of a complex vector w. In order to find the minimum we then
need to
min J(w). (1.10)
w
This amounts to compute the derivative of J(w) given in (1.9) and set the derivative to zero to find
the wopt which minimizes J(w).
1.4 Multichannel Adaptive Beamforming

In the previous section, we considered the case of temporal adaptive equalization. In this section we
analyze an example of spatial adaptive equalization. Another interesting application for adaptive
equalization is multichannel adaptive beamforming, which instead of temporal filtering is basically
spatial filtering. In Fig. 1.11 a uniform linear array (ULA) is depicted on which a planar wavefront
impinges with an angle θ. This wavefront is spatially sampled by the sensors of the antenna array
and a weighted sum of these samples constitutes the output y[n]. The weighting vector is called
beamforming vector. As in the previous section, the wk for k = 0, . . . , M − 1, represent the filter
coefficients and the y[n] is the output at time instant n. However, the input is now sampled in space
and hence, uk [n] denotes the received signal at the k-th receive antenna at the time instant n.
In Fig. 1.11, the angle of arrival of the impinging wavefront is given by θ. Additionally, the
distance between adjacent antenna in the ULA is ∆ and therefore, there is a delay between two
adjacent antennas receiving the impinging wavefront. We denote this delay as τ . The azimuthal
angle of arrival (AoA, direction of arrival (DoA)) can be converted to an electric phase angle φ
via the delay τ between two sensors. To this end, we have that the distance that the signal travels
during the delay τ is
c · τ = ∆ · sin θ, (1.11)
where c is the speed of light
ωc ωc
c = λ · fc = λ · =λ· , (1.12)
2π 2π
with λ, fc and ω as the wavelength of the impinging wavefront or signal, the carrier frequency of
the impinging wavefront and the angular frequency, respectively. Hence, taking (1.11) and (1.12)
the delay is
2π π
τ = · ∆ · sin θ = | sin θ, (1.13)
λωc ω
∆= λ c
2
1.4 Multichannel Adaptive Beamforming 11
Array of Sensors Adjustable Weights
u0 [n]
w0∗
Impinging
wavefront θ ∆
u1 [n]
τ w1∗
∆
u2 [n] P Output
2τ w2∗
y[n]
u3 [n]
3τ w3∗
u4 [n]
4τ w4∗
Adaptive
Control
Algorithm
Steering vector a(θ)

Fig. 1.11. Adaptive beamformer for an array of five sensors
where in the last equality we have assumed that distance between adjacent antennas is half a
wavelength of the impinging wavefront, i.e. ∆ = λ2 1 . Using (1.13), the electric phase angle is
φ = ωc τ = π sin θ. (1.14)
We have the input vector

 
u0 [n]
 u1 [n] 
 ∈ CM ,
 
u[n] =  .. (1.15)
 . 
uM −1 [n]
and the filter vector  

w0
 w1 
 ∈ CM .
 
w= .. (1.16)
 . 
wM −1
1
In this course, unless otherwise stated we will usually make this assumption when we have a ULA.
The output is given by

M
X −1
y[n] = uk [n]wk∗ = wH · u[n], (1.17)
k=0
y ∗ [n] = uH [n] · w. (1.18)
Let us collect again N + 1 snapshots in one output vector of the beamformer,

 
y ∗ [n]
 y ∗ [n + 1] 
∗  
y [n] =  ..  = UH · w (1.19)
 . 
∗
y [n + N ]
with  
uH [n]
 uH [n + 1] 
 ∈ C(N +1)×M
 H 
U = .. (1.20)
 . 
uH [n + N ]
At the moment we are lacking a desired signal d[n], which we can use to drive the weight vector
to an optimal solution wopt !
Assume, we have several planar wavefronts (far-field approximation) incident on the sensor
array arriving at angles θ1 , θ2 , . . . , θd . Let the wavefront with angle of arrival (AoA) θd be the
desired one and consider all the other wavefronts as interferers which should be suppressed. θd
is the only AoA, which we know a priori. Let us first calculate the contribution of this desired
wavefront to the beamformer output. To this end, we assume that the received signal impinging on
the first antenna of the array is d[n]. Therefore, we have that the desired signal impinging on the
k-th antenna of the ULA at time n is denoted by dk [n] and for k = 0, . . . , M − 1, given by
d0 [n] = d[n]
d1 [n] = d[n − τ ] ≈ d[n]e−jφd
d2 [n] = d[n − 2τ ] ≈ d[n]e−j2φd
.. ..
. .
dM −1 [n] = d[n − (M − 1)τ ] ≈ d[n]e−j(M −1)φd
where the approximation ≈ comes from the narrowband assumption, i.e the envelope of the signal
of our wavefront is approximately constant for several multiples of τ .
We put this array measurement of our desired signal into vector form
 
  1
d0 [n]
 d1 [n]  
 e−jφd 

   e−j2φd 
d[n] =  .. ≈  · d[n] = a(θd ) · d[n], (1.21)
 .   .. 
 . 
dM −1 [n] −j(M −1)φd
e
1.4 Multichannel Adaptive Beamforming 13
where we have substituted  

1

 e−jφd 


a(θd ) =  e−j2φd 
, (1.22)
 .. 
 . 
e−j(M −1)φd
where a(θd ) is the so-called array steering vector evaluated at the angle of arrival θd .
Let us denote the output signal due to the desired wavefront as
d[n] = wH · d[n] = wH · a(θd ) · d[n] = d[n],
i.e. we require the beamformer output due to the desired wavefront to be d[n]. This means that
wH · a(θd ) = 1. Besides this constraint we would like to minimize the output power of the
beamformer, since this will minimize the interference power at the output.
The total output signal y[n] = wH [n] · u[n] and assuming D impinging wavefronts on the array,
PD
we have that u[n] = ui [n] as the sum of the D impinging wavefronts where our desired signal
i=1
is ud [n] = d[n]. For each impinging wavefront i = 1, . . . , D the received signal of the ULA is
 
1

 e−jφi 

 e−j2φi 
ui [n] =   · ui [n] = a(θi ) · ui [n], (1.23)
 .. 
 . 
−j(M −1)φi
e
and the collection of N + 1 samples of the output is

     
y ∗ [n] uH [n] · w uH [n]
 y ∗ [n + 1]   uH [n + 1] · w   uH [n + 1] 
y∗ [n] = 
     
.. = .. = ..  · w = UH · w, (1.24)
 .   .   . 
y ∗ [n + N ] uH [n + N ] · w uH [n + N ]
where recall that U is given by (1.20). The power at the output is
ky[n]k22 = yH [n] · y[n] = y[n]T · y∗ [n]

= wH U · UH w = wH Rw, (1.25)
with R = UUH . Therefore, the problem can be stated as follows:
min ky[n]k22 = min wH Rw, subject to wH · a(θd ) = 1.

w w
The problem at hand is a quadratic minimization subject to a linear constraint, i.e. a linearly
constrained least squares (LCLS) problem! In next chapter, we will see how we can solve such a
problem.
2. Mathematical Background
As we have seen in Section 1.3 and 1.4, we need to find extreme points of a real valued scalar
cost function, which in general is a function of a complex vector. This optimization can be an
unconstrained or a constrained one. In order to find extreme points, we have to compute derivatives
with respect to complex vectors.
2.1 Gradients
Let us start with computing derivatives with respect to vectors in the field of reals
x ∈ Rn
f (x) ∈ R
f : Rn → R
 ∂f 
∂x1
∂f
df (x) 
 ∂x2


= ..  = gradf (x) = ∇x f (x)
dx  . 
∂f
∂xn
T
df (x)
df = · dx.
dx
If f (x) is linear
f (x) = aT x = xT a,
where a ∈ Rn , then we have
df
= a,
dx
since
n
X
f (x) = ak · x k ,
k=1
where x = [x1 , x2 , . . . , xn ]T and a = [a1 , a2 , . . . , an ]T and we have that
∂f (x)
= ak .
∂xk
14
2.2 Differentiation with respect to a Complex Vector 15
If f (x) is quadratic
f (x) = xT Ax,
where A ∈ Rn×n , we have
df d d
= xT v | + uT x |
dx dx v = Ax = const dx uT = xT A = const
d d
= xT v + xT u
dx dx
= v+u

= A + AT x
= | 2Ax.
A=AT
2.2 Differentiation with respect to a Complex Vector

Let us start with different ways, we can write a scalar cost function, which is a function of a
complex vector z.
z = x + jy
z ∈ Cn
x, y ∈ Rn
z∗ = x − jy
We can write the function either as a function of z, or alternatively as a function of x and y or

eventually as a function of z and z∗ :
h(z) : Cn → C
f (x, y) : Rn × Rn → C
g(z, z∗ ) : Cn × Cn → C.
Let us pick a simple example for illustration:
h(z) = kzk22
f (x, y) = xT x + y T y
g(z, z∗ ) = (z∗ )T z
h(z) = f (x, y) = g(z, z∗ ).
The definition of a gradient enables us to compute the increment of the function due to an
increment of the vector argument:
T
dh
dh = · dz
dz
T T
∂f ∂f
df = · dx + · dy
∂x ∂y
T T
∂g ∂g
dg = · dz + · dz∗ .
∂z ∂z∗
16 2. Mathematical Background
∂f ∂f ∂g ∂g
Let us assume that f, g ∈ C are differentiable, therefore, ∂x , ∂y and ∂z , ∂z∗ exist and thus, df
and dg can be computed and df = dg!
dh
dz
exists only, if h(z) is analytic. That means that assuming h = hR + jhI (Re{h} = hR and
Im{h} = hI ), then the Cauchy-Riemann equations must hold:
∂hR ∂hI
= and
∂x ∂y
∂hR ∂hI
= −
∂y ∂x
Many functions encountered in signal processing are not analytic, e.g. wH Rw. In fact, cost
functions are real valued and, thus, not analytic. Therefore we have to drop the possibility to work
with h. It is easy to show, that

  ∂g 1 ∂f ∂f
∂f ∂g ∂g 
 = −j
= + ∗ 
  ∂z
 2 ∂x ∂y
∂x ∂z ∂z ⇔
∂f ∂g ∂g .
= j − ∗   

 ∂g 1 ∂f ∂f
∂y ∂z ∂z 
 ∗
= +j
∂z 2 ∂x ∂y
∂f ∂f
Since f is real valued, both ∂x
and ∂y
are also real valued and, therefore,
∗
∂g ∂g
=
∂z ∂z∗
T T

∂g ∂g
dg = dz + dz∗
∂z ∂z∗
T H
∂g ∂g
= dz + dz∗
∂z ∂z
T T !∗
∂g ∂g
= dz + dz
∂z ∂z
( )
T
∂g
= 2Re dz
∂z
( )
H
∂g
= 2Re dz∗ .
∂z
Additionally, note that
!
dg = df,
since
T T
∂g ∂g
dg = dz + dz∗
∂z ∂z∗
T T ! T T !
∂g ∂g ∂g ∂g
= + dx + j − dy
∂z ∂z∗ ∂z ∂z∗
T T
∂f ∂f
= dx + dy.
∂x ∂y
2.3 Quadratic Optimization with Linear Constraints 17
∂g ∂g
Therefore, only one derivative, either ∂z or ∂z ∗ must be computed to have the full gradient
∂g
information! To compute stationary points of f or g, it is sufficient to set ∂z ∗ = 0. The direction
of the steepest descent (gradient descent) is given by −dg. To this end, we need to find for which
direction dz, with a given length kdzk, we will obtain a maximal dg.
( )
H
∂g !
dg = 2Re dz∗ = max .
∂z
From this and based on the Cauchy-Schwarz inequality, we conclude that the steepest descent will
∂g ∗ ∂g
be achieved if ∆z = −µ ∂z ∗ or ∆z = −µ ∂z , with µ ∈ R+ .
Look at the problem in Section 1.3

J(w, w∗ ) = σd2 − wH p − pH w + wH Rw
∂J
= −p + Rw,
∂w∗
with R = RH and with the stationary point at
wopt = R−1 p.
The steepest descent is
dw = µ(p − Rw),
and n o
dJ = 2Re (Rw − p)H dw∗ .
These calculations are much simpler than to work with f (x, y). This method is called Wirtinger
calculus.
2.3 Quadratic Optimization with Linear Constraints

In Section 1.4 we have an optimization problem, which is constrained by a linear equation. Let us
generalize this to a set of linear constraints.
min f (w) subject to c(w) = SH w − g = 0
w
where w ∈ CM , c, g ∈ CK , SH ∈ CK×M K < M . In order to accommodate the constraints, we

augment the cost function and obtain the Lagrangian function L(w, λ), which we minimize over
w and maximize over λ, the vector of Lagrangian multipliers:
  
 c(w) c ∗ (w) 

 z }| { z }| {
 H H T T ∗ ∗ 
max min  L(w, λ) = f (w) + λ (S w − g) + λ (S w − g )
λ  w  | {z } | {z }

 

∈R =2Re{λH (SH w−g)}
First we take the derivative with respect to w∗

∂L ∂f ∂ H H

= + λ · c(w) + c (w) · λ
∂w∗ ∂w∗ ∂w∗
∂f !
= ∗
+ Sλ = 0.
∂w
Next we differentiate with respect to λ∗

∂L ∂ H H

∗ = ∗ λ · c(w) + c (w) · λ
∂λ ∂λ
!
= c(w) = 0.
From where we have

∂f
= −Sλ
∂w∗
SH w − g = 0,
respectively.
Choosing as a simple example cost function
f (w) = wH w.
Then
∂f
=w
∂w∗
and
∂L
= 0,
∂w∗
from where we have
w = −Sλ.
and
−SH Sλ − g = 0.
Computing λ
λ = −(SH S)−1 g,
and plugging this in the previous equation we then have
wopt = S(SH S)−1 g = (SH )+ g,
where (SH )+ is the so-called pseudo-inverse of SH , which will be discussed later on.
2.4 Stochastic Processes

2.4.1 Characterization: Mean, Autocorrelation, Autocovariance, Variance
A stochastic process is not just a single function of time, but is represented by - in theory - an infi-
nite number of different realizations of the said process. The set of realizations is called ensemble.
The expectation operator E[•] takes the average over the different realizations (ensemble average),
i.e. across the process.
If the process is stationary we then have that the mean, autocorrelation and autocovariance as
µ[n] = µ
r[n, n − k] = r[k]
c[n, n − k] = c[k],
2.4 Stochastic Processes 19
Table 2.1. General Definitions of Mean, Autocorrelation and Autocovariance

Characterization Representation Expression
Mean-value function µ[n] E [u[n]]
Autocorrelation function r[n, n − k] E [u[n]u∗ [n − k]]
E [(u[n] − µ[n]) (u[n − k] − µ(n − k))∗ ]
Autocovariance function c[n, n − k]
r[n, n − k] − µ[n]µ∗ [n − k]
respectively. The autocovariance function can be expressed by the autocorrelation function and the
mean value function:
c[n, n − k] = E [(u[n] − µ[n]) (u[n − k] − µ[n − k])∗ ]

= E [u[n]u∗ [n − k]] − E [µ[n]u∗ [n − k]] − E [u[n]µ∗ [n − k]] + E [µ[n]µ∗ [n − k]]
= r[n, n − k] − µ[n]E [u∗ [n − k]] − µ∗ [n − k]E [u[n]] + µ[n]µ∗ [n − k]
= r[n, n − k] − µ[n]µ∗ [n − k] − µ∗ [n − k]µ[n] + µ[n]µ∗ [n − k]
= r[n, n − k] − µ[n]µ∗ [n − k].
For k = 0 we have that

r[0] = E |u[n]|2
c[0] = σ 2 ,
where r[0] and c[0] are the mean square value and the variance, respectively. A summary of these
general definitions is shown in Table 2.1.
2.4.1.1 Time averages (averages along the process)

If the process is ergodic, then we have the expected value over an ensemble of time averages is the
same as the expected value over the process
E[µ̂[n]] = µ,
where µ̂[n] is an unbiased estimate of µ. In addition,
E[|µ − µ̂[n]|2 ] = 0
E[|r[k] − r̂(k, N )|2 ] = 0,
mean ergodic and correlation ergodic, in the mean square error sense, respectively. A summary
with the definitions for the time averages is shown in Table 2.2.
The stochastic processes which we will work with are stationary, ergodic and zero-mean.
2.4.2 Correlation Matrix

With the signal vectors we have to work with correlation matrices.
 
u[n]

 u[n − 1] 

u[n] =  .. 
 . 
u[n − M + 1]
Table 2.2. Time averages: Mean, Autocorrelation and Autocovariance

Characterization Representation Expression
−1
NP
1
Mean-value function µ̂[N ] N
u[n]
n=0
−1
NP
1
Autocorrelation function r̂(k, N ) N
u[n]u∗ [n − k] 0 ≤ k ≤ N − 1
n=0
Autocovariance function ĉ(k, N ) r̂(k, N ) − µ̂[n]µ̂∗ [n] 0 ≤ k ≤ N − 1
  
u[n]
H


 u[n − 1] 
 ∗ ∗ ∗


R = E u[n] · u [n] = E  ..  · [u [n], u [n − 1], · · · , u [n − M + 1]]
 .  
u[n − M + 1]
 
r[0] r[1] ··· r[M − 1]
 r[−1] r[0] ··· r[M − 2] 
 ∈ CM ×M ,
 
R= ...
 ... ... ... 
r[−M + 1] r[−M + 2] · · · r[0]
which is a matrix that is both Toeplitz and Hermitian. A matrix is Toeplitz
if the entries along
H T ∗
the diagonals are the same. A matrix is Hermitian if R = R = R , which leads to having
r[−k] = r∗ [k]. In addition, R is nonnegative definite, which means that xH Rx ≥ 0 ∀x. This is
shown in the following. Let us denote
y = xH · u[n]
y ∗ = uH [n] · x,
where x is a an arbitrary vector of appropriate dimension. Then we have

E [yy ∗ ] = E |y|2 ≥ 0

= E xH · u[n] · uH [n] · x

= xH · E u[n] · uH [n] · x
= xH · R · x ≥ 0
which concludes the proof.

Example: Assume we have a complex exponential signal plus noise. The noise is assumed to
be zero-mean white noise, i.e. E [ν[n]] = 0 and in addition

∗ σν2 k = 0
E [ν[n]ν [n − k]] = .
0 k= 6 0
We have the signal expressed as follows
u[n] = α · ejωT n + ν[n].

2.4 Stochastic Processes 21
Then the autocorrelation function is given by
r[k] = E [u[n]u∗ [n − k]]

= E (α · ejωT n + ν[n])(α∗ · e−jωT (n−k) + ν ∗ [n − k])

= E |α|2 ejωT k + α∗ e−jωT (n−k) · ν[n] + αejωT n · ν ∗ [n − k] + ν[n]ν ∗ [n − k]
= |α|2 ejωT k + α∗ e−jωT (n−k) · E [ν[n]] + αejωT n · E [ν ∗ [n − k]] + E [ν[n]ν ∗ [n − k]] ,
| {z } | {z }
0 0
resulting in
|α|2 + σν2 k = 0
r[k] = .
|α|2 ejωkT k =
6 0
Thus, the correlation matrix is
 1

1 + SNR ejωT ··· ejωT (M −1)
−jωT 1
2

 e 1 + SNR ··· ejωT (M −2) 

R = |α| ·  .. .. ... ,
 . . ... 
1
e−jωT (M −1) e−jωT (M −2) · · · 1 + SNR
|α|2
where SNR = σν2
.
2.5 Linear Equations

Many problems in science in general and in signal processing in particular come down to solving
sets of linear equations. Therefore let us review the most important aspects of this topic from a
geometric point of view.
Given a matrix A ∈ Cm×n and a vector b ∈ Cm , we are looking for a vector x ∈ Cn , which
”as good as possible” fulfills
Ax ≈ b.
The rank of a matrix is the number of linearly independent column(row)-vectors [2]
rank(A) = r ≤ min(m, n).
The columnspace or image or span(A) is the vector space spanned by the column vectors of A
S = span(A) = image(A) = {y|y = Ax, x ∈ Cn }.
If b ∈ span(A), then there exists an x such that Ax = b, otherwise we have only Ax ≈ b.

The dimension of the columnspace of A is equal to the rank of A
dim (span(A)) = rank(A) = r.
Now we will consider three different cases concerning the size of A. Let us start with the most
familiar, but not necessarily most important case, i.e. with a square full rank matrix : r = m = n,
i.e. the number of equations is equal to the number of unknowns and all equations are linearly
independent.
n x b
m A • =
There exists a unique inverse A−1 such that
x = A−1 b.
This does not mean, that we should compute the solution by calculating the inverse A−1 . There are
many more algorithms available to solve such a set of linear equations like Gaussian Elimination,
LU-, LR-, QR-, QL-decomposition or conjugate gradient (CG) descent, which depending on the
specific setting should be employed.
2.5 Linear Equations 23
Next let us assume that m > n, i.e. more equations than unknowns (overdetermined system of
equations), but still r = n, i.e. a full rank tall matrix:
n
In this case we can
premultiply from the
n m
left with AH getting
=
AH Ax = AH b,
m A •
x b
where AH A ∈ Cn×n now is a full rank square matrix, which has a unique standard inverse
leading to the following solution
−1
x = AH A AH b = A+ b = xLS ,
which is the so-called least squares solution, as we shall see later. This solution is unique and can
be computed with QR-, QL-decompositions or CG-descent. Anyway, we have to be aware that
AxLS 6= b.
Looking finally on the case m < n (underdetermined system of equations) and still A being a
full rank flat matrix m = r:
n
dim(spanA) = m
if b ∈ spanA ⇒
n m
m A • = ∃x : Ax = b
which is not unique!
x b
Since there is no unique solution but a manifold of solutions, we can choose one of these from
the (n − r)-dimensional solution space, e.g. by minimizing the norm of the solution vector:
Ax = b subject to kxk2 −→ min .
Before we proceed let us look at a specific example discussing the three aforementioned dif-
ferent cases. Assume noisefree measurements taken from a ULA with m sensors exposed to n
impinging planar wavefronts. If the signal from the i-th impinging wavefront is given by xi then
the measurement is
n
X
b = xi · a i
i=1
b = Ax,
T
where x = x1 x2 · · · xn where A is the array steering matrix
A = [a1 (θ1 ), a2 (θ2 ), · · · , an (θn )] with

 
1
 ejµi 
  d
 ej2µi 
ai (θi ) =   , µi = 2π · · sin θi ,
 ..  λ
 . 
ej(m−1)µi
where µi are the spatial frequencies.

For the moment assume that we know A (through DoA estimation with ESPRIT or MUSIC,
which we will deal with later on) and it is our goal to reconstruct the vector x of impinging wave-
fronts from our array measurements b. Since we have assumed that there is no noise, b ∈ span(A)
holds. We of course assume that all n wavefronts impinge from different directions, therefore A is
full rank.
m = n : there is a unique solution reconstructing the n planar wavefronts.
m > n : (m − n) antenna elements are superfluous, we can drop them.
m < n : the wavefield, consisting of the superposition of n planar wavefronts cannot be
reconstructed uniquely!
But since there is noise n we have
b = Ax + n,
where b in general in none of these cases is in the column space of A, and therefore b ≈ Ax!
Let us first have a closer look at m > n = r, i.e. all wavefronts have distinct DoA’s and we
have more antenna elements than wavefronts. Therefore we have
Ax − b = −n and xLS = argminkAx − bk22 .

x
kAx − bk22 = (Ax − b)H (Ax − b)

= xH AH Ax − xH AH b − bH Ax + bH b = knk22
∂knk22 −1 H
= A H
Ax − A H
b = 0 ⇒ x LS = A H
A A b = A+ b,
∂x∗
where A+ ∈ Cn×m and A+ A = 1n .
If r < n (i.e several wavefronts are impinging from the same direction and, therefore, are
indistinguishable.) AH A is rank deficient and its standard inverse does not exist.
To be able to handle such a situation, it is necessary to introduce the Singular Value Decompo-
sition (SVD) of a matrix. Any matrix A can be decomposed into the product of three matrices
A = UΣVH , A ∈ Cm×n , rank(A) = r ≤ min(m, n),

where U and V are unitary matrices, i.e.
UUH = UH U = 1m , VVH = VH V = 1n and
min(m,n)
Σ = diag{σi }i=1 ∈ (R+ ∪ {0})m×n
with
σ1 ≥ σ2 ≥ . . . ≥ σr > σr+1 = . . . = σmin(m,n) = 0.
The following two pictures try to visualize the situation.
n r m−r r m−rn−m n
Σs VsH r
m A =m Us Uo • m •
Σo n
VoH n−r
n r m−r r n−r
Σs VsH r
• n
m A =m Us Uo • m
VoH n−r
Σo
We also see that

r
X
A = Us Σs VsH + Uo Σo VoH = UΣVH = σi ui viH
i=1
and rank(ui · viH ) = 1.

Now we are able to express the generalized inverse A+ with the aid of the SVD. Let us first
start with the case m > n = r, which we already have handled satisfactorily. For n = r, there is
no Σo and
m Σs n=r with σ1 ≥ σ2 ≥ . . . ≥ σn > 0.

Σ =
m−r
The generalized inverse A+ is obtained by
n m−n
n
A+ = VΣ+ UH with Σ+ = Σ−1
s ·
We can easily verify this with the following calculation

!−1
−1
AH A AH = VΣT |U{z
H
U}ΣVH VΣT UH
1m
 −1
T H
= VΣ
| {zΣ}V VΣT UH
Σ2s
−1
= VΣ2s VH VΣT UH
= VΣ−2
s |
H
V{zV}ΣT UH
1n
−2 T H
= VΣs Σ U
+ H
= VΣ U
= A+ q.e.d.
For r < n the situation is a little bit more involved. For m > n > r, Σ reads as follows
r n−r
Σs
m
Σ= ·
0
For the generalized inverse A+ we proceed as follows
Σ−1
s
A+ = VΣ+ UH with Σ+ = ·
0
The obtained x with x = A+ b is the solution vector with the smallest norm. We will show this
by asking the question how to choose Σ+ in order to minimize knk22 , n = Ax − b?
knk22 = (Ax − b)H (Ax − b)

H
= UΣVH x − b UΣVH x − b
= xH VΣT UH UΣVH x − xH VΣT UH b − bH UΣVH x + bH b
∂knk22
∗
= VΣT ΣVH x − VΣT UH b = 0.
∂x
+ + H
For x = A b = VΣ U b the above derivative should be zero. Let us first look at ΣT Σ:
Σs Σs Σ2s
• = = Σ2S0 ·
0 0 0
Therefore we have
∂knk22
= | VΣ2S0 |V{z
H
V}Σ+ UH b − VΣT UH b
x∗ x=VΣ+ UH b 1n

= V Σ2S0 Σ+ −Σ T
UH b = 0,
Thus
∂knk22
= | 0
x∗ x=VΣ+ UH b
implies that
Σ2S0 · Σ+ = ΣT
Σ2s Σ−1
s Σs
• = ·
0 ∗ 0
For the don’t care entry ”*” we can choose whatever we want, the derivative will be zero.
Therefore, let us use this degree of freedom to minimize the norm of the solution x!
x = VΣ+ UH b ⇒ (VH x) = Σ+ (UH b)
kVH xk22 = kxk22

kUH bk22 = kbk22
since both V and U are unitary matrices.

To minimize kxk22 as many entries in Σ+ as possible should be zero! Therefore, we choose
Σ−1
s
Σ+ =
·
0
Now we have a solution for computing generalized inverses, no matter how large m, n and r
are. We simply invert only the nonzero singular values to obtain Σ+ and leave the zeroes as they
are.
Given A and its SVD we now discuss the four fundamental subspaces of a matrix. We already
had the column space of A:
S = span(A) = {y|y = Ax, x ∈ Cn }, dim(S) = r.

Table 2.3. Four Fundamental Subspaces of a Matrix A

Subspace Representation Dimension Definition
Columnspace im(A) dim (im(A)) = r im(A) = {b|b = Ax, x ∈ Cn }
Nullspace null(A) dim (ker(A)) = n − r ker(A) = {x|Ax = 0}
Row space im(AH ) dim im(AH ) = r im(AH ) = {b|b = AH x, x ∈ Cm }
Left Nullspace ker(AH ) dim ker(AH ) = m − r ker(AH ) = {x|AH x = 0}
Us , the first r columns of U are a unitary basis of S, which is also called signal subspace. Next
we have the nullspace or kernel of A:
N = ker(A) = {x|Ax = 0} dim(N ) = n − r.
Vo , the last n − r rows of VH are a unitary basis of N .

The columnspace of AH (also called left columnspace of A) is:
Sl = span(AH ) = {y|y = AH x, x ∈ Cm }, dim(Sl ) = r.
Vs , the first r columns of V are a unitary basis of Sl . Finally, we have the nullspace of AH (left
nullspace of A):
Nl = ker(AH ) = {x|AH x = 0} dim(Nl ) = m − r.
Uo , the last m − r columns of U are a unitary basis of Nl , which is also called the noise subspace!
The four fundamental subspace are summarized in Table 2.3.
In signal processing we will mainly deal with the signal and the noise subspace. It is interesting
to note, that the SVD is not unique, but the four fundamental subspaces, for which the SVD gives
unitary basis vectors, are unique. This will be shown in the following calculation:
A = UΣVH = UΦΦ∗ ΣΨΨ* VH

∗ ∗
with Φ = diag{ejφi }m jφi n
i=1 and Ψ = diag{e }i=1 , ΦΦ = 1m and ΨΨ = 1n . In addition, we have
′ ′ ′ ′
Φ∗ ΣΨ = Σ, UΦ = U , VΨ = V and A = UΣVH = U ΣV H .
Subspaces are uniquely characterized by projectors P. Let Us ∈ Cm×r be a unitary basis of an
r-dimensional subspace of Cm , m > r. Then Us Q, Q ∈ Cr×r and unitary, is also a unitary basis
of the same subspace. The projector onto this subspace is
PS = AA+ = UΣVH VΣ+ UH = UΣΣ+ UH =
1r UHs
= Us Uo · · = Us UHs .
0 UHo
Projectors are:
• Hermitian matrices: P = PH
• Idempotent matrices: P = P2
• Rank deficient matrices: rank(P) < m.
′
A change of the basis vectors for a given subspace Us = Us Q, Q unitary, does not change the
projector
′ ′ ′
PS = Us Us,H = Us QQH UHs = Us UHs = PS !
| {z }
1r
For tracking subspaces it is advantageous to work with projectors and not with basis vectors of
the (slowly) changing subspace.
Now we can look at the connection between SVD and the well known eigenvalue decomposi-
tion (EVD), which exists only for square matrices
A ∈ Cm×m : A = QΛQ−1 , Λ = diag{λi }m

i=1 .
(Attention: not every square matrix is diagonalizable. This leads to the so called Jordan forms,
which we will not discuss here.)
The above transform from A to Λ (and vice versa) is called similarity transform. Similarity
transforms leave the eigenvalues and the trace of a matrix invariant. If A is a normal matrix, then
Q is a unitary matrix.
A is a normal matrix ⇔ AH A = AAH .
It is easy to see that hermitian matrices are normal. If A is hermitian and positive semidefinite,
then we have
A = QΛQH , Λ = diag{λi }m i=1 , λi ∈ R+ ∪ {0}.
Additionally, if we arrange the eigenvalues such that λ1 ≥ λ2 ≥ . . . λm , then the EVD and SVD
of A are identical. Covariance matrices are always hermitian and positive semidefinite.
The column vectors q of Q are the so called eigenvectors of A:
Aq = λq.
This equation tells us that there are vectors q, when multiplied by A do not change directions
in space but will only be scaled by λ. To determine those vectors q, we have to solve
(A − λ1)q = 0,
which has nontrivial solutions only, if det(A − λ1) = 0, which determine the eigenvalues λ.
Assume that A is an estimated correlation matrix R = UUH , which is positive semidefinite
and hermitian. We will now show, that the eigenvalues of such a matrix are nonnegative.
Rqi = λi q i
H
qi Rqi = λi qHi qi
= λi kqi k22 ,
from which we obtain

qHi Rqi
λi = ≥ 0,
kqi k22
since qHi Rqi ≥ 0 because R is positive semidefinite.
If all eigenvalues of such a matrix are distinct, then all eigenvectors will be perpendicular to
each other, i.e qHi qj = 0 ∀i 6= j.
Rqi = λi q i
H
qj Rqi = λi qHj qi .
Similarly we have
Rqj = λj qj
qHj R = λj qHj
qHj Rqi = λj qHj qi .
Then subtracting the last two results
qHj Rqi = λi qHj qi

−(qHj Rqi = λj qHj qi )
0 = (λi − λj ) · qHj qi
| {z }
6=0
0 = qHj qi ,
from where we have that qHj qi = 0, which means that qj and qi are perpendicular!
By simply scaling all eigenvectors to unit norm, the matrix Q = [q1 , q2 , . . . , qm ] is unitary!
Moreover, we have
tr(R) = tr(QΛQH )
= tr(QH QΛ)
= tr(Λ)
Xm
= λi ,
i=1
where tr(R) is the trace of R, i.e. the sum of the diagonal elements of R.
H
The Rayleigh quotient of R is xxHRx x
.
xH Rx
λmax = max
x ∈ Cm xH x
x 6= 0
and
xH Rx
λmin = min .
x ∈ Cm xH x
x 6= 0
2.6 Eigenfilter (Generalized Matched Filter)
Signal Input Output

u[n] x[n] FIR Filter y[n]
w∗
Noise
ν[n]
Fig. 2.1. Linear Filtering
Let us denote the following vectors as

     
w0 u[n] ν[n]
 w1 
 

 u[n − 1] 


 ν[n − 1] 

w =  ..  , u[n] =  .. , ν[n] =  .. ,
 .   .   . 
wM −1 u[n − M + 1] ν[n − M + 1]
where we have that the signal u[n] is uncorrelated with the noise ν[n]:
E [u[n]ν ∗ [m]] = 0 ∀ n, m.
From Figure 2.1, we have that the output y[n] can be expressed as
y[n] = wH (u[n] + ν[n]) .
The power of the output can be written as
P = E [y[n]y ∗ [n]]

= E wH (u[n] + ν[n]) (uH [n] + ν H [n])w

= E wH (u[n]uH [n] + u[n]ν H [n] + ν[n]uH [n] + ν[n]ν H [n] w

= wH E u[n]uH [n] + E ν[n]ν H [n] w
= wH Rw + wH σ 2 w
= PS + PN ,
where PS and PN are the power of the signal and the power of the noise, respectively. Thus, we
can express the signal to noise ratio (SNR) at the output as
PS wH Rw
SNR = = 2 .
PN σ kwk2
Let us now find the wopt that maximizes the SNR. To this end, let us differentiate the SNR
expression with respect to w∗ and set it to zero
H
∂ w Rw Rw · wH w − wH Rww
= = 0.
∂w∗ wH w (wH w)2
2.6 Eigenfilter (Generalized Matched Filter) 33
In order that the numerator is equal to zero it must hold that
wH Rw
Rw = ·w
wH w
= λ · q,
wH Rw
where w (= q) is an eigenvector and wH w
(= λ) is the corresponding eigenvalue of the matrix
R. Thus, the SNR is maximized for
wopt = qmax
where qmax is the eigenvector corresponding to the largest eigenvalue λmax of R, i.e.
Rqmax = λmax qmax .
The maximum SNR is then given by
λmax (R)
SNRmax = .
σ2
3. Adaptive Filters
3.1 Linear Optimum Filtering (Wiener Filters)

The optimum receive filters, which we investigate in this chapter, are linear discrete-time filters
with finite duration impulse response (FIR). The reasons for these restrictions are as follows:
• linearity: allows for the principle of superposition and makes the mathematical analysis easy
to handle.
• discrete-time: allows for implementation with digital VLSI hardware and firm/software.
• FIR: provides inherent stability.
When we talk about optimum filters, we have to give a criterion for optimization. The statistical
criterion for optimization (adaptation), which we use here, is the mean square value of the esti-
mation error, i.e. the expectation of the square difference between the desired and the actual filter
output, i.e. the mean square error (MSE). This MSE should be minimized, i.e. we are aiming
at a minimum mean square error (MMSE) solution. The reason for this choice is tractability of
mathematics and practical performance.
The discrete-time system model is given in Fig 3.1
Noise
Channel ν[n] ν[n]
ˆ
y[n] = d[n]
s[n] h[n], h x[n] u[n] w∗ [n]
s[n] H w∗
x[n] u[n]
s[n − l] = d[n] minimize

l·T
s[n − l] = d[n] E |y[n] − d[n]|2
Fig. 3.1. Discrete Time System Model
34
3.1 Linear Optimum Filtering (Wiener Filters) 35
The discrete-time channel impulse response is also modeled by an FIR filter:

K
X
h[n] = hk · δ[n − k],
k=0
where
1 n=0
δ[n] = .
0 else
In addition we have
   
h0 s[n]
 h1   s[n − 1] 
 ∈ CK+1 ,  ∈ CN ,
   
h= .. s[n] =  ..
 .   . 
hK s[n − N + 1]
and
K
X
x[n] = s[n] ⋆ h[n] = hk · s[n − k],
k=0
where ”⋆” denotes convolution and

 
x[n]
 x[n − 1] 
 ∈ CM .
 
x[n] =  ..
 . 
x[n − M + 1]
The matrix H can be written as

 
h0 h1 h2 ··· hK 0 0 0
 0 h0 h1 h2 · · · hK 0 0 
 ∈ CM ×N
 
H= ... ... ... ...
 0 0 ··· 0 
0 0 ··· h0 h1 h2 · · · hK
and is called convolutional matrix, which has Toeplitz structure. With H we can write
x[n] = H · s[n],
and
u[n] = x[n] + ν[n] = Hs[n] + ν[n],
with u[n], ν[n] ∈ CM . The output y[n] is obtained by convolving u[n] with w∗ [n]
y[n] = u[n] ⋆ w∗ [n] = wH · u[n] = wH (Hs[n] + ν[n]).
The error is defined as
e[n] = d[n] − y[n] = d[n] − wH (Hs[n] + ν[n]).

36 3. Adaptive Filters
Its mean square value is

E |e[n]|2 = E |y[n] − d[n]|2 ,
from where we can find an optimum vector wopt such that

wopt = argmin E |e[n]|2 .
w
Therefore, the cost function can be written as

J(w, w∗ ) = E [e[n] · e∗ [n]] = E (d[n] − wH u[n])(d∗ [n] − uH [n]w) .
Differentiating J with respect to w∗ leads to

∂ ∗ ∗ ∂
J(w, w ) = E e [n] (d[n] − w u[n]) = −E [u[n]e∗ [n]] .
H
∂w∗ ∂w∗
∂J
= 0 → E [u[n]e∗ [n]] = 0,
∂w∗
which means we can state that
E [y[n]e∗ [n]] = wH E [u[n]e∗ [n]] = 0!
The last two equations are known as the principle of orthogonality.

We can also write

J = E |d[n]|2 − wH u[n]d∗ [n] − uH [n]d[n]w + wH u[n]uH [n]w

= σd2 − wH E [u[n]d∗ [n]] − E uH [n]d[n] w + wH E u[n]uH [n] w
= σd2 − wH p − pH w + wH Rw
and arrive at
∂J
= −p + Rw = 0 → wopt = R−1 p
∂w∗
The minimum error computes to
Jmin (wopt ) = σd2 − wopt

H
p − pH wopt + wopt
H
Rwopt
= σd2 − pH R−1 p − pH R−1 p + pH R−1 RR−1 p
= σd2 − pH R−1 p,
where it has been assumed that R is positive definitive, which means that R−1 exists. It is Hermi-
tian anyway. The canonical form of the error surface J(w) can be described as
J(w) = σd2 − wH RR−1 p − pH R−1 Rw + pH R−1 RR−1 p − pH R−1 p + wH Rw

= σd2 − wH Rwopt − wopt
H H
Rw + wopt Rwopt + wH Rw − pH R−1 p
= σd2 + (wH − wopt
H
)R(w − wopt ) − pH R−1 p
= Jmin + (wH − wopt
H
)R(w − wopt )
= Jmin + ∆wH R∆w,
and with the eigenvalue decomposition of R = QΛQH we have

J(w) = Jmin + ∆wH QΛQH ∆w.
Substituting v = QH ∆w leads to
J(v) = Jmin + vH Λv
XM
= Jmin + λk vk vk∗
k=1
XM
= Jmin + λk |vk |2 ,
k=1
where λk > 0.
Now assume h[n] and therefore h and H are known to the receiver (e.g. through the process of
channel estimation). Let us rewrite

E u[n]uH [n] = Ruu ∈ CM ×M

E s[n]sH [n] = Rss ∈ CN ×N

E ν[n]ν H [n] = Rνν ∈ CM ×M
All three covariance matrices are Hermitian and positive definite. Therefore, all three have a
standard inverse.
Since u[n] = Hs[n] + ν[n], then

Ruu = E u[n]uH [n] = E (Hs[n] + ν[n])(Hs[n] + ν[n])H

= HRss HH + H · E s[n]ν H [n] + E ν[n]sH [n] · HH + Rνν
| {z } | {z }
0 0
H
= HRss H + Rνν ,
and
pud = E [u[n]d∗ [n]]
= E [(Hs[n] + ν[n])d∗ [n]]
= HE [s[n]s∗ [n − l]] + E [ν[n]s∗ [n − l]]
| {z }
0

= HE s[n]sH [n] · el+1
= HRss el+1 ,
where el+1 is a vector with all entries equal to zero except the (l + 1)th entry
 
1 0
2  0 

..  .. 
.  . 
el+1 =  .
l+1  1 
..  . 
.  .. 
N 0
Therefore,
wopt = R−1 p
= R−1
uu pud
−1
= HRss HH + Rνν HRss el+1 .
With application of the matrix inversion lemma

−1
(A + BCD)−1 = A−1 − A−1 B C−1 + DA−1 B DA−1 ,
where A and C must be square and full rank, and B and D must be of appropriate sizes, we can
rewrite the optimum filter vector:
H
−1
wopt = eTl+1 Rss HH HRss HH + Rνν
−1 H −1
T H −1 −1 −1 H −1
= el+1 Rss H Rνν − Rνν H Rss + H Rνν H H Rνν

−1
= eTl+1 Rss HH R−1 νν − H H −1
R νν H R −1
ss + H H −1
R νν H H H −1
R νν

−1
= eTl+1 Rss 1N − HH R−1 −1
νν H Rss + H Rνν H
H −1
HH R−1νν
−1 H −1
= eTl+1 Rss R−1 H −1
ss + H Rνν H − H Rνν H
H −1
R−1 H −1
ss + H Rνν H H Rνν
−1
= eTl+1 R−1 H −1
ss + H Rνν H HH R−1νν ,
and for white transmit symbols and white noise we have
Rss = σd2 1N
Rνν = σν2 1M
and the optimum filter vector is

−1
H 1 1 H 1
wopt = eTl+1 1 + 2H H
2 N
HH 2
σd σν σν
2 −1
σν
= eTl+1 1 + HH H
2 N
HH ,
σd
and from using wopt = R−1 p we had also that the optimum filter vector can be expressed as
2 −1
H T H σν H
wopt = el+1 H 1M + HH .
σd2
2
For very large SNR ( σσν2 → 0) the parenthesized matrices in the last two expressions will
d
converge to either (HH H)−1 or (HHH )−1 , respectively. Since H is a flat matrix, HH H is not full
rank and therefore does not have a standard inverse. But HHH is full rank and has a standard
inverse and we get
H
H T H H −1 T H +
lim wopt = el+1 H HH = el+1 H .
SNR→∞
We could equally well compute a generalized inverse of HH H making use of the SVD of
H = UΣVH :
+ +
HH H HH = VΣT UH UΣVH VΣT UH
+
= VΣT ΣVH VΣT UH
+
= V Σ T Σ Σ T UH
= VΣ+ UH
= H+ .
Therefore we also have + H

H+ = HH .
′
Let us try to understand, what this (wopt )H = eTl+1 H+ actually achieves.
Assume in this noise-free case a single isolated impulse transmission:
     
1 1 1 0 1 0
2  0  2  1 
  2 
 0 

s[n] = 3  0 
  , s[n + 1] = 3  0 
  , · · · , s[n + N − 1] = 3 
 0 
.
..  ..  ..  ..  ..  .. 
.  .  .  .  .  . 
N 0 N 0 N 1
The corresponding channel output vectors x[n] = u[n] are therefore
u[n] = Hs[n]
u[n + 1] = Hs[n + 1]
···
u[n + N − 1] = Hs[n + N − 1].
Using the specific transmit vectors above, we get the following receive matrix:
     
1 h0 1 h1 1 0
2  0 
 2  h0 
 2   0 

u[n] = ..  ..  , u[n + 1] = ..  ..  , · · · , u[n + N − 1] = ..  .. .
.  .  .  .  .  . 
M 0 M 0 M hK
 
h0 0 0 ··· 0
 h1 h0 0 ··· 0 
   
T
u [n] 
 h2 h1 h0 ··· 0 

 uT [n + 1]   .. ... ... ... .. 
   . . 
T

U= u [n + 2]  = 

 hK hK−1 hK−2 · · · h0  ∈ CN ×M

 ..  
0 hK hK−1 · · · h1

 .   
T
u [n + N − 1]

 .. ... ... ... .. 

 . . 
 0 0 0 ··· hK−1 
0 0 0 ··· hK
This receive matrix UT is equal to H and we impose

wH UT = wH H = eTl+1 ,
i.e. we require wH to equalize the channel such, that the impulse response of channel plus equalizer
+
is a delayed impulse. With HH as the pseudoinverse of HH we have
HH w = el+1
HHH w = Hel+1
−1 ′
wLS = HHH Hel+1 = wopt
| {z }
(HH )+
H
−1
wLS = eTl+1 HH HHH = lim wH .
| {z } σν22 →0 opt
H σ
((HH )+ ) =H+ d
That is, we try to enforce zero intersymbol interference (ISI), but have not enough degrees of
freedom to do so. Therefore we refrain to a least squares solution.
We could also drop some of the equations enforcing the (N −1) zeros to have only M equations
enforcing (M − 1) zeros and a one at delay l. This is equivalent to neglecting the ISI contribution
from (N − M ) leading and/or lagging transmit symbols and can be expressed by a reduced receive
matrix
UTr = Hr ∈ CM ×M
by dropping the Npost first and Npre last rows, such that
Nr = N − (Npre + Npost ) = M
holds. Then Hr is a square full rank matrix and
wH Hr = eTl+1
can be exactly fulfilled with the unique solution
H
wZF = eTl+1 H−1
r ,
which zeroes out the ISI from adjacent symbols within the Nr -window and, therefore, is called
zero forcing solution.
2
On the other hand, if σσν2 ≫ max {λi }N
i=1 (low SNR region) is fulfilled, we get
d
H σd2
wopt = eTl+1 · HH · .
σν2
σd2
We will show in the following, that this wopt is the so called matched filter. Replace first HH · σν2
with G with
 
g1H
 gH 
 2 
G =  ..  ∈ CN ×M
 . 
H
gN
eTl+1 G = gl+1
H
and
H H
y[n] = (gl+1 Hs[n] + gl+1 ν[n]).
The last term is the noise contribution with a variance
H
E |gl+1 ν[n]|2 = gl+1 H
Rνν gl+1
= σν kgl+1 k2 .
2
The first term contains the desired signal contribution, stemming from s[n − l] = d[n] and in-
tersymbol interference from previous and subsequent symbols. Let us focus on the desired part,
which is
H H
gl+1 Hel+1 s[n − l] = gl+1 Hel+1 d[n],
the variance of which is
H
E gl+1 Hel+1 d[n]d∗ [n]eTl+1 HH gl+1 = gl+1
H
Hel+1 eTl+1 HH gl+1 · E [d[n]d∗ [n]].
| {z }
σd2
With N = 2M − 1 and l + 1 = M we have

 
0
   0 
 
h0 h1 h2 h3 ··· hK 0 0 0 0 0  .. 
  . 
 0 h0 h1 h2 h3 ··· hK 0 0 0 0   

  0 
 0 0 h0 h1 h2 h3 · · · hK 0 0 0 · 
Hel+1 =   1  = Π · h,
 0 0 0 h0 h1 h2 h3 · · · hK 0 0   

  0 
 0 0 0 0 h0 h1 h2 h3 · · · hK 0  .. 
 . 
0 0 0 0 0 h0 h1 h2 h3 · · · hK  
 0 
0
where  
0 ··· 0 ··· 1
.
0 ··· 0 0
..
 
 
Π=
 0 ··· 1 ··· 0 

.
0 0 ··· 0
..
 
1 ··· 0 ··· 0
and Π · Π = 1. With the variance of the desired signal becomes
H
(gl+1 Πh)(hH Πgl+1 )σd2 = |gl+1
H
Πh|2 σd2 .
The ratio of the desired signal variance and the noise variance at the filter output is
H
|gl+1 Πh|2 σd2
SNR = · ,
kgl+1 k22 σν2
which will be maximal, if

gl+1 = αΠh,
and the maximum SNR is

|α∗ hH ΠΠh|2 σd2
SNRmax = · 2
kαΠhk22 σν
2 H 2 2
|α| |h h| σd
= ·
|α|2 khk22 σν2
khk42 σd2
= ·
khk22 σν2
σ2
= khk22 · d2 ,
σν
which is independent of α! Then the optimum vector is

H H
wopt = gl+1
σd2 H
= ·h Π
σν2
σd2 ∗ ∗ ∗ ∗

= · h K , h K−1 , · · · , h 1 , h 0
σν2
σ2
= d2 · h∗
σν
H
= wMF .
Thus, we see that the MMSE solution converges in the high SNR regime to the ZF solution and
in the low SNR regime to the MF solution.
3.2 Spatial Filtering 43
3.2 Spatial Filtering

Here we are dealing with beamforming, which we will treat as a minimization problem with lin-
ear constraints. We will tackle that problem in two steps: first we will accommodate one linear
constraint, an approach which is known as minimum variance distortionless response (MVDR)
beamformer and second we will accommodate many linear constraints which is known as gen-
eral sidelobe canceller (GSC). Although MVDR is a special case of GSC, both are treated in the
literature separately.
3.2.1 Minimum Variance Distortionless Response (MVDR) Beamforming

Let us first look at a linearly constrained temporal minimum variance filter. Due to the FIR struc-
ture, the output of this filter with a complex harmonic excitation u[n] = ejω0 T n with frequency ω0
can be written as
y[n] = wH · u[n]
 
u[n]

 u[n − 1] 


= wH ·  u[n − 2] 

 .. 
 . 
u[n − M + 1]
 
1
−jω0 T

 e 

−jω 2T
· e 0
jω0 T n 
H 
= w ·e 
 .. 
 . 
e−jω0 (M −1)T
= ejω0 T n · wH · a(φ0 ),
with φ0 = ω0 T . Note that a(φ0 ) is a Vandermonde vector which is identical to the array steer-
ing vector which has been introduced in Section 1.4, when dealing with spatial filtering, i.e with
beamforming. Our aim is now to design the filter weight vector w such that this complex har-
monic with frequency ω0 should pass unattenuated, while any other frequency component of the
input signal should be attenuated as much as possible. Assume now that u[n] contains beside our
desired component at ω0 many other spectral components. We therefore minimize the variance of
the filter output but make sure, that the desired component is not suppressed

wopt = argmin E wH u[n]uH [n]w = argmin wH Rw
w w
subject to the following constraint

wH a(φ0 ) = g ∗ ,
g ∗ being the complex gain of our filter at ω0 . The problem, therefore, belongs to the class we have
dealt with in Section 2.3 (Quadratic Optimization with Linear Constraints).
Since we have only one constraint, the corresponding Lagrangian function reads
L(w, λ) = wH Rw + λ(wH a(φ0 ) − g ∗ ) + λ∗ (aH (φ0 )w − g).

∂L(w, λ)
∗
= Rw + λa(φ0 ) = 0 → wopt = −λR−1 a(φ0 )
∂w
∂L(w, λ)
= aH (φ0 )w − g = 0 → aH (φ0 )w = g
∂λ∗
−g
g = −λaH (φ0 )R−1 a(φ0 ) → λ = H .
a (φ0 )R−1 a(φ0 )
Finally we have
gR−1 a(φ0 )
w= .
aH (φ0 )R−1 a(φ0 )
If we interpret φ0 = 2π ∆
λ
sin θ0 , then this is the so called linearly constrained minimum variance
(LCMV) beamformer. If we set g = 1, then we have the minimum variance distortionless response
(MVDR) beamformer.
The minimum variance, which is attained by the MVDR beamformer is
H 1
Jmin = wopt Rwopt = −1
aH (φ 0 )R a(φ0 )
and the so called spatial power spectrum reads
1
SMVDR (φ) = , φ ∈ [−π, π].
aH (φ)R−1 a(φ)
3.2.2 Generalized Sidelobe Canceller (GSC)

Let us now extend the previous problem to more than one linear constraint, i.e. we look for a
solution of the following problem
wopt = argmin wH Rw s.t. CH w = g,

w
where CH is the K × M (K < M ) constraint matrix

 
aH (θ0 )
 aH (θ1 ) 
 
CH =  .. 
 . 
H
a (θK−1 )
and g is the vector of antenna gain in the directions of arrival (or departure) θk for k = 0, . . . , K −1.
Using zeros and ones as entries of this vector means that signals impinging from the correspond-
ing directions are either suppressed or preserved. Let us augment the columns of C with some
orthogonal columns to obtain a square matrix

K M −K
U= z}|{ z}|{ ∈ CM ×M , CH Ca = 0.
C Ca
3.2 Spatial Filtering 45
These additional column vectors, which are collected in Ca are a basis for the orthogonal comple-
ment of the space spanned by the columns of C, i.e. image(C) and dim (image(C)) = K. Since
U is full rank, the column vectors are a basis for the M -dimensional vector space and we can write
w =U·q and q = U−1 · w, q, w ∈ CM .
Let us partition q in the following way

v } K
q= ,
−wa } M −K

v
w= C Ca · = Cv − Ca wa .
−wa
CH · w = CH Cv − CH Ca wa = CH Cv = g.
| {z }
0
Therefore, we have
v = (CH C)−1 g
and
wq = Cv = C(CH C)−1 g
is the so called quiescent weight vector and finally we have
w = wq − Ca wa ,
which can be cast into the block diagram in Figure 3.2.
d[n]
wqH
u[n] y[n]
M
x[n]
CHa waH
(M − K) × M
M −K
Fig. 3.2. Block Diagram of the GSC
From Figure 3.2 we have
y[n] = wH u[n] = (wqH − waH CHa )u[n] = d[n] − waH x[n].

Let us now minimize the output variance
wa,opt = argmin E [y[n]y ∗ [n]]

wa
= argmin (σd2 − pHx wa − waH px + waH Rxx wa ),

wa

with σd2 = E [|d[n]|2 ], px = E [x[n]d∗ [n]], and Rxx = E x[n]xH [n] .
Note that by decomposing w into a quiescent component wq and an adaptive component wa we
have transformed the original constrained optimization into an unconstrained one. The quiescent
vector wq and the ”blocking” matrix Ca take care, that the constraints are fulfilled, no matter
how wa is adapted. The optimum solution for wa can easily be obtained by recognizing, that
the problem we have formulated is identical to the original Wiener filter, which we have derived
earlier.
wa,opt = R−1 H −1 H
xx px = (Ca RCa ) Ca Rwq
and
wopt = (1 − Ca (CHa RCa )−1 CHa R)C(CH C)−1 g.
Of course, this quadratic optimization problem with multiple linear constraints could be solved
with the Lagrangian approach, which we have used in the single constraint case (LCMV). The
solution reads
wopt = R−1 C(CH R−1 C)−1 g,
which is identical to the aforementioned one.
Remark: Although not obvious the identity can be shown by solving an equivalent problem
formulated in a transformed variable z with
w = Az
and
1
A = VΛ− 2 ,
where V and Λ can be obtained from an EVD of R = VΛVH . The approach here with wq and
wa , although more involved, gives more insight into the structure of the solution.
3.3 Iterative Solution of the Normal Equation

We have seen that we can find the optimum linear filter by solving the so called normal equation
(Wiener-Hopf equation)
Rwopt = p.
This has been shown for temporal equalization (Fig. 3.1) and for spatial filtering by applying the
Generalized Sidelobe Canceller (GSC) concept (Fig. 3.2). In the latter case we have transformed
our linearly constrained quadratic optimization problem into an unconstrained minimization prob-
lem in reduced dimensions having a normal equation.
Now we will try to avoid the computationally involved direct solution and aim at an iterative
algorithm instead to converge to the optimum solution wopt . Fig. 3.3 illustrates the problem for a
two dimensional filter vector. We take the cost function from our previous elaborations
3.3 Iterative Solution of the Normal Equation 47
J(w, w∗ ) = σd2 − wH p − pH w + wH Rw,

where
σd2 = E [d[n]d∗ [n]] variance of the desired signal

p = E [x[n]d∗ [n]] cross correlation vector

R = E x[n]xH [n] correlation matrix.
The gradient of the cost function with respect to w∗
∂J(w, w∗ )
= −p + Rw
∂w∗
points into the direction of steepest ascent of the cost function. The minimum of J we have already
computed previously to
∗
Jmin = J(wopt , wopt ) = σd2 − pH R−1 p. (3.1)
Starting from an arbitrary initial value for filter vector w[0] we try to approach this minimum
by incrementing w[0] with a step in the direction of steepest descent, i.e. in the direction of the
negative gradient.
With that, we arrive at
w[1] = w[0] + ∆w[0] = w[0] + µ(p − Rw[0]),
where µ > 0 is the stepsize, which we will choose appropriately.

Generalizing the above equation for arbitrary iteration steps we get
w[n + 1] = w[n] + ∆w[n] = w[n] + µ(p − Rw[n])

w[n + 1] = (1 − µR)w[n] + µp.
This last equation can be viewed as a linear discrete-time state-space system, where w[n] is the
state vector with constant excitation µp. Let us first transform this system to a homogeneous one
by shifting the fixed point to the origin
c[n] = w[n] − wopt and c[n + 1] = w[n + 1] − wopt ,

c[n + 1] = (1 − µR)c[n].
Next we will diagonalize the homogeneous discrete-time state-space system by first computing
the EVD of R = QΛQH and
c[n + 1] = (1 − µQΛQH )c[n] |→ QH

QH c[n + 1] = (QH Q − µQH QΛ)QH c[n].
With ν[n] = QH c[n] we have

ν[n + 1] = (1 − µΛ)ν[n].
Since (1 − µΛ) is diagonal, we get
νk [n + 1] = (1 − µλk )νk [n] ∀k = 1, . . . , M.

J(w, w∗ )
Error Surface
Jmin
w2,opt
w2
w1,opt
∂J
w[0] − ∂w ∗ |w=w[0]
w1
Fig. 3.3. Parabolic error surface
Thereby λk ≥ 0 ∀k = 1, . . . , M are the eigenvalues of the positive semidefinite correlation matrix

R. To assure the fixed point at ν = 0 to be stable, the inequality
2
|1 − µλk | < 1 ⇒ 0 < µ <
λk
2
must hold and therefore 0 < µ < λmax .
This is not a very useful criterion, because it would necessitate an EVD of R. Reminding
that the motivation for the iterative solution was reducing complexity in finding a solution for the
normal equation it is not a good idea to perform an even more involved EVD instead. But we can
make use of the following relation:
M
X
λmax ≤ λk = tr(R)
k=1
and come up with a more stringent but simple to compute upper limit for the stepsize guaranteeing
convergence
2 2
0<µ< = .
tr(R) M r(0)
Let us look at a simple two-dimensional example:
tr(R) = 4,
λ1 = 3, λ2 = 1,
T
w [0] = [2, 8]
T
wopt = [2, 1]
We have chosen to stop the iterations, if the norm of the gradient falls below 10−5 :
kp − Rw[n]k22 = kr[n]k22 < 10−5 .
Fig. 3.4 shows the trajectory of w[n] for a small stepsize µ = 0.1. The upper bound for µ is
2 2
λmax
= 0.667 and the more stringent but easier to compute upper bound is tr(R) = 0.5. We see that
it takes 70 steps to converge from the initial value to the optimum one within the given residual
error. Let us next choose the stepsize more aggressive as µ = 0.6 still below the true upper bound.
Now we need only 38 steps to converge Fig. 3.5. In Fig. 3.6, we have the trajectory for µ = 0.5
and we need only 13 steps.
Contours
8 22
4
80
12
6 8
128
32
4 16
80
32
80
2 4
8
32
8
16
w2
16
0
4
22
12
8
4
80 32
−2 16
32
−4
22 12
4 8 80
−6 80
128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.4. Small Stepsize (µ = 0.1): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 70.
Contours
8
80
80
6
12
128
32 8
4 16
32
80
80
2
32
8
16
w2
0 16 4
12
8
22
8
32
16
4
−2
80
32
−4
12
22 8
4 80
−6
128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.5. Stepsize chosen close to the Upper Bound (µ = 0.6): Trajectory from w[0] toward the solution.
Notice the oscillations due to the choice of the stepsize. The number of iterations required for convergence
is 38.
Contours
8 80
128
12
6 8
80
32
80
4 16
32
8
2 4
32
16
12
8
8
w2
0
80
16 4
16
8
−2
32
22
32
4
12
−4 8 80
−6 80
22
4 128
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.6. Moderate stepsize (µ = 0.5): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 13, which is smaller than the previous two examples.
Now we try to optimize the step size for every iteration step. For this we calculate the error
function at step n.
J(w[n]) = J[n] = σd2 − wH [n]p − pH w[n] + wH [n]Rw[n]

and at step n + 1:
J[n + 1] = J[n] − ∆wH [n]p − pH ∆w[n] + ∆wH [n]R∆w[n] + wH [n]R∆w[n] + ∆wH [n]Rw[n]
= J[n] − µ∗ (pH − wH [n]R)p − pH µ(p − Rw[n]) + µ∗ (pH − wH [n]R)Rµ(p − Rw[n])
+wH [n]Rµ(p − Rw[n]) + µ∗ (pH − wH [n]R)Rw[n].
We now minimize J[n + 1] by choosing µ appropriately

∂J[n + 1]
= −(pH − wH [n]R)(p − Rw[n]) + µ(pH − wH [n]R)R(p − Rw[n]).
∂µ∗
From this we find
(p − Rw[n])H (p − Rw[n]) rH [n]r[n]
µopt [n] = = .
(p − Rw[n])H R(p − Rw[n]) rH [n]Rr[n]
r[n] and kr[n]k22 = rH [n]r[n] have to be computed anyway, because they are the negative gradient
and the stopping criterion. The extra burden for optimizing the stepsize therefore is the quadratic
form in the denominator. Fig. 3.7 shows the trajectory for this case with 8 iteration steps.
Contours
8 80
128
12
8
6
32
80
4 16
80
32
2
32
4
12
8
16
8
w2
0 4
16
80 8
−2 32 16
22
32
4
12
−4 8
80
−6 80
22 128
4
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.7. Optimum stepsize (µopt [n]): Trajectory from w[0] toward the solution. The number of iterations
required for convergence is 8.
Finally we see in Fig. 3.8 a situation with a larger spread of eigenvalues (λmax = 3, λmin = 0.1).
The constant error ellipses become slimmer and the number of iteration increases.
Contours
8 16 32 80
6
8
16
32 12
8
4 16 32 80
4
8
8
4
2
12 16
8
80 32
w2
0 16 32
8
22 4
4
4
8
−2
12 16
8
80 32
−4
8
22
−6
4 16
12
8 32
80
−8
−8 −6 −4 −2 0 2 4 6 8
w1
Fig. 3.8. Moderate Stepsize (µ = 0.5): Trajectory from w[0] toward the solution with a large spread of
eigenvalues (λmax = 3, λmin = 0.1). The number of iterations required for convergence is 99.
3.4 Least Mean Square Algorithm (LMS) 53
3.4 Least Mean Square Algorithm (LMS)

The LMS algorithm (1960 Widrow and Hoff) is a ”stochastic gradient algorithm” approximating
the steepest descent procedure described in the previous section.
In a real-world application we do not know the true correlation matrix R and the true cross-
correlation vector p. Therefore, we have to estimate these expectations to have an estimate for the
gradient
\ !
∂J(w)
| = −p̂[n] + R̂ · w[n].
∂w∗ w[n]
We take a very simple approximation by using only one sample
R̂[n] = u[n]uH [n] and

p̂[n] = u[n]d∗ [n].
The update equation for the weight vector now reads
w[n + 1] = w[n] + µ(u[n]d∗ [n] − u[n]uH [n]w[n])

= w[n] + µu[n](d∗ [n] − uH [n]w[n])
= w[n] + µu[n] · e∗ [n].
This shows that we simply have to multiply the actual input vector u[n] with the actual complex
conjugate error e∗ [n] = (d[n] − wH [n]u[n])∗ = d∗ [n] − y ∗ [n] and use this as the update for the
next step. Fig. 3.9 shows a block diagram implementing both the filtering or equalization process
computing the actual y[n] and the adaptation or updating of the filter vector w[n].
54
y[n]
···
3. Adaptive Filters
w0∗ [n] w1∗ [n] w2∗ [n] ∗
wM −1 [n]
∗
wM [n]
u[n]
T T T ··· T
T T T ··· T T
(•)∗ (•)∗ (•)∗ ··· (•)∗ (•)∗
∆w0∗ [n] ∆w1∗ [n] ∆w2∗ [n] ∗

∆wM −1 [n]
∗
∆wM [n]
µ∗
d[n]
···
e[n]
Fig. 3.9. Block Diagram of the Implementation of the LMS
4. High Resolution Direction-of-Arrival (DoA) Estimation
In Section 3.2 we were dealing with the beamforming problem, i.e. spatial filtering based on the
knowledge of direction of desired as well as some undesired wavefronts. Now we will address the
problem of how to get this knowledge with so called subspace based high resolution techniques.
Such techniques are not limited by the aperture of the antenna array.
First we introduce two assumptions, which are important for the derivation of rather simple
models and algorithms. We start with the far-field data model, which leads to planar wavefronts
impinging on the array, see Fig. 4.1.
s1 (t) sd (t)
∆·sin θd
τ (θd ) = c
∆·sin θ1
θ1 c
= τ (θ1 )
···
x1 (t) x2 (t) xM −1 (t) θd
∆ ∆ ∆ ∆
Fig. 4.1. Uniform Linear Array (ULA) with M elements with spacing ∆ and d impinging planar wavefronts
with angles of arrivals θ1 up to θd .
The second important assumption is the narrow band data model. Here we assume, the time τ
which it takes for a wavefront to propagate along the array from the first to the last sensor is very
small compared to the symbol period of the data modulated onto the wavefront.
si (t − τ ) ≈ si (t)e−j2πfc τ , ∀i = 1, . . . , d and τ = (M − 1)τ (θi ).
Here fc is the carrier frequency of the modulated radio signal and si (t) is the complex envelope.
55
56 4. High Resolution Direction-of-Arrival (DoA) Estimation
Therefore, we can write our receive signal vector x(t) as

     
x1 (t) a1 (θi )e−j2πfc τ1 (θi ) n1 (t)
 x2 (t)  −j2πfc τ2 (θi )
d  a2 (θi )e n2 (t)
   
  X   
 a3 (θi )e−j2πfc τ3 (θi )
x(t) =  x3 (t)  = n3 (t)
    
  · si (t) +  
 .. 
i=1
 ..   .. 
 .   .   . 
xM (t) −j2πf c τM (θi ) nM (t)
aM (θi )e
 
s1 (t)
 s2 (t) 
 
= [a(θ1 ), a(θ2 ), a(θ3 ), . . . , a(θd )] ·  s3 (t)
 
 + n(t)
 .. 
 . 
sd (t)
= A · s(t) + n(t).
Let us choose the first sensor as reference sensor. Denoting c as the speed of propagation:
τ1 (θi ) = 0
∆ · sin θi
τ2 (θi ) =
c
..
.
∆ · sin θi
τM (θi ) = (M − 1) .
c
With that we define spatial frequencies
2πfc 2π
µi = − ∆ sin θi = | − ∆ sin θi = | −π · sin θi
c λ= fc λ ∆= λ
c 2
leading to the so called array steering vector

aT (θi ) = aT (µi ) = 1, ejµi , ej2µi , . . . , ej(M −1)µi · a0 (µi ).
Here a0 (µi ) is the antenna pattern of the identical sensors, which we will assume to be omnidirec-
tional, i.e. a0 (µi ) = 1. This leads us to the array steering matrix of a ULA
 
1 1 ··· 1

 ejµ1 ejµ2 ··· ejµd 

 ej2µ1 j2µ2
··· j2µd
e e  ∈ CM ×d .

AULA = 
 .. .. . .. .
.. 
 . . 
j(M −1)µ1 j(M −1)µ2 j(M −1)µd
e e ··· e
With this Vandermonde matrix AULA we can write our data model as
x(t) = AULA · s(t) + n(t).

4.1 Subspace Estimation 57
4.1 Subspace Estimation

The true signal subspace can be computed if we know the true signal correlation matrix

Rxx = E x(t)xH (t) = E (As(t) + n(t))(sH (t)AH + nH (t))
= ARss AH + Rnn ,
where
Rss = E s(t)sH (t) and Rnn = E n(t)nH (t)
and noise and signal are uncorrelated as usual. If the noise at the antenna elements is i.i.d. we have
Rnn = σn2 · 1M . In the noisefree case, Rxx will be rank deficient
rank(Rxx ) | = rank(ARss AH ) = d < M.
2 =0
σn
The EVD of Rxx reads

H
H
Λd 0 US
Rxx = UΛU = | US UO · ·
2 =0
σn
0 0 UHO
= US Λd UHS
Xd
= λk uk uHk ,
k=1
with λ1 ≥ λ2 ≥ · · · λd > λd+1 = λd+2 = · · · = λM = 0 and uk being the the eigenvectors of Rxx
corresponding to the d nonzero eigenvalues.
The columnspace of the array steering matrix can be expressed as
image{A} = image{US } = S,
because
ARss AH = US Λd UHS .
The nullspace of AH , which is the same as the noise subspace or left nullspace of A, reads
kernel{AH } = kernel{UHS } = N = image{UO }
Now taking the noise into account we have
H
Λd 0 US
Rxx = US UO · + σn2 · 1M ·
0 0 UHO
M
X
= pk uk uHk ,
k=1
with pk = λk + σn2 .
But because we do not know the true correlation matrix we have to estimate the signal subspace
from received snapshots of data
X = [x(t1 ), x(t2 ), · · · , x(tN )] ∈ CM ×N , N snapshots.
= A · [s(t1 ), s(t2 ), · · · , s(tN )] + [n(t1 ), n(t2 ), · · · , n(tN )]
= AS + N,
where we have assumed that our scenario does not change during the time from t1 to tN and A
therefore stays constant.
N
1 X 1
R̂xx = x(tn )xH (tn ) = XXH
N n=1 N
is the estimate of the correlation matrix obtained from the received samples. We can either compute
an EVD of R̂xx or an SVD of X
H
H
Σs 0 Vs
X = UΣV = Us Uo · ·
0 Σo VH
Ho
H
Λs 0 Us
R̂xx = UΛU = Us Uo · ·
0 Λo UHo
with Λ = N1 ΣΣT , where we have assumed N > M .

The link between true and estimated subspace is
S = image{US } = lim (image{Us }) and

N →∞
N = image{UO } = lim (image{Uo }) .
N →∞
4.2 MUltiple SIgnal Classification (MUSIC)

MUSIC is an algorithm for DoA estimation based on the estimated noise subspace image{Uo }.
We assume the array manifold to be known, i.e. we know the functional dependence of the array
steering vector on the angle of arrival a(θ).
Any vector a(θi ) is element of the signal subspace S, since S = image{A} and, therefore,
is orthogonal to any vector element of the noise subspace, or in other words, we know the array
geometry:
aH (θi )UO = [0, 0, · · · , 0] ≈ aH (θi )Uo ∀i = 1, . . . , d.
The so called MUSIC spectrum is defined as
ka(θ)k22
SMUSIC (θ) =
kaH Uo k22
aH (θ)a(θ)
= H
a (θ)Uo UHo a(θ)
aH (θ)a(θ)
= H ,
a (θ)Po a(θ)
with Po = Uo UHo the projector onto N .

Performing a spectral search of SMUSIC (θ) we obtain estimates for the θi ’s. By varying θ, we
see that θ approaching a θi leads to a denominator, which is zero or at least very little. Therefore,
the maxima of SMUSIC (θ) are indicating, that the argument θ is an estimate for θi . Fig. 4.2 depicts
such a MUSIC spectrum.
4.3 The Standard ESPRIT Algorithm 59
MUSIC − Power Spectrum (M=8,N=100,SNR=0 dB )

110
100
90
80
70
(θ)
MUSIC 60
50
S
40
30
20
10
0
−1.5 −1 −0.5 0 0.5 1 1.5
Angle θ in Radians
Fig. 4.2. MUSIC Spectrum with M = 8 antennas, d = 3 impinging wavefronts and N = 100 snapshots
with SNR = 0 dB.
4.3 The Standard ESPRIT Algorithm

The standard ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques) is
also a high resolution subspace-based algorithm, but in contrast to MUSIC, it works on the signal
subspace instead of the noise subspace.
The key to the application of ESPRIT is a translational shift invariance structure of the antenna
array. As shown in Fig. 4.3, the array must consist of m ≥ d pairs of sensors, so called doublets.
The two sensors of each doublet must have the same radiation pattern and there must be the same
displacement ∆ between the sensors of each pair. Therefore, the array can be partitioned into two
subarrays, where we can map one onto the other by a simple translational shift by ∆.
Non−overlapping subarrays
∆ · sin θ
Impinging
3
Wavefront θ
∆ 2
Subarray 2
6
4
1
5
Subarray 1
Doublets
Fig. 4.3. Antenna Array with M = 6 antennas 3 non-overlapping doublets and a translational shift invari-
ance structure. Different shapes for each antenna (triangles, squares and circles) represent different antenna
patterns.
4.3 The Standard ESPRIT Algorithm 61
Overlapping Subarrays
Subarray 2
∆
3
2
1
4
5
Doublets Subarray 1
Fig. 4.4. Antenna Array with M = 5 antennas and 3 overlapping doublets and a translational shift invari-
ance structure. Different shapes for each antenna (squares and circles) represent different antenna patterns.
The overall number of antenna elements is 2m = M = 6 in the example of Fig. 4.3. Obviously,
we can be more efficient by working with overlapping subarrays as it is depicted in Fig. 4.4.
Furthermore, this is especially the case with a ULA as shown in Fig. 4.5.
Subarray 2
···
x1 x2 x3 xm xM
Subarray 1
Fig. 4.5. Revealing the translational shift invariant subarray structure of a ULA with M = m + 1.
The receive signal vector of such an antenna array can be split into two vectors with the aid of
two selection matrices
 
1 0 0 ... 0
 0 1 0 ... 0 
 
J1 =  .. .. . . . . ..  = [ 1m 0 ],
 . . . . . 
0 0 0 1 0
 
0 1 0 0 ...
 0 0 1 0 ... 
 
J2 =  .. .. .. . . . .  = [ 0 1m ].
 . . . . . 
0 0 0 0 1
Starting from
 
1 1 ··· 1

 φ1 φ2 ··· 
φd

 φ21 φ22 ··· 
φ2d
x(t) =  .. .. ..  · s(t) + n(t), φi = ejµi
..

 m−1 m−1 . . . 
 .
m−1
 φ1 φ2 · · · φd 
m m m
φ1 φ2 · · · φd
 
1 1 ··· 1
 φ1
 φ2 · · · φd  
 φ2 φ 2
· · · φ 2  ′ ′
x1 (t) = J1 x(t) =  1 2 d  · s(t) + J1 n(t) = A s(t) + n (t)
 .. .. .. .. 
 . . . . 
m−1 m−1 m−1
φ1 φ2 · · · φd
 
φ1 φ2 · · · φd
 φ2 φ22 · · · φ2d 
 1 
x2 (t) = J2 x(t) =  ... .. .. ..  · s(t) + J n(t) = A′′ s(t) + n′′ (t).

 m−1 m−1 . . .  
2
m−1
 φ1 φ2 · · · φd 
φm1 φ m
2 · · · φm d
′′ ′
We can see that A = A Φ with Φ = diag{φi }di=1 . Furthermore,
′ ′′
J1 A = A , J2 A = A ⇒ J1 AΦ = J2 A.
Since image{A} = image{US } = S it follows that there exists a nonsingular TA such that
A = US TA : J1 US TA Φ = J2 US TA and
J1 US TA ΦT−1 −1
A = J2 US and with TA ΦTA = Ψ
we obtain the so called invariance equation
J1 U S Ψ = J2 U S .
This is an overdetermined system of linear equations (J1 US Ψ ∈ Cm×d : m · d equations for d2

unknowns). Since we know only the estimated signal subspace with basis Us , the equality sign
in the invariance equation is never fulfilled exactly. Therefore, we are looking for a least squares
solution for
J1 U s Ψ ≈ J 2 U s .
With the solution Ψ we are performing an EVD
Ψ = TA ΦT−1A
λ

and with µi = arg φi and θi = arcsin − 2π∆ µi we have estimates for the DoA’s.
4.4 Unitary ESPRIT: Real Valued Subspace Estimation 63
Let us now summarize the three steps of the Standard ESPRIT algorithm:
1) Signal Subspace Estimation: Compute Us ∈ CM ×d either via the
• Square root approach or direct data approach: From the SVD of
H
Σs 0 Vs
X = Us Uo · · ∈ CM ×N , N > M,
0 Σo VoH
as the d dominant left singular vectors of X or via the

• Covariance Approach: From the EVD of
H
Λs 0 Us
H
XX = Us Uo · · ∈ CM ×M ,
0 Λo UHo
as the d dominant eigenvectors of XXH

2) Solution of the Invariance Equation:
J U Ψ ≈ J2 U s ,
| 1{z }s | {z }
Cm×d Cm×d
by means of Least Squares (LS), Total Least Squares (TLS) or Structured Least Squares (SLS).
3) Spatial Frequency (DoA) Estimation: Compute the eigenvalues of Ψ
Ψ = TA ΦT−1
A , Φ = diag{φi }di=1

λ
µi = arg φi , θi = arcsin − µi .
2π∆
Therefore, we have a closed form solution for the problem at hand without any numerical search
like in spectral MUSIC.
4.4 Unitary ESPRIT: Real Valued Subspace Estimation

4.4.1 Centro-Hermitian Matrices
Definition: a matrix M is centro-Hermitian, if
M ∈ Cp×q : Πp · M∗ · Πq = M
Πp · M · Πq = M ∗ ,
holds. Πp is a so-called exchange matrix, which is a square p × p matrix given by

 
0 0 ··· 0 1
 0 0 ··· 1 0 
 
Πp =  ... ... ... ... p×p
 
 ∈ {0, 1} ,
 
 0 1 ··· 0 0 
1 0 ··· 0 0
since when multiplying a given matrix A with Π from the left we are exchanging the rows of the
A. Additionally, if we multiply A with Π from the right we are exchanging the columns of the
matrix. Furthermore, note that Π−1
p = Πp .
4.4.2 Left Π-real Matrices

Definition: a matrix Q is left-Π-real, if
Q ∈ Cp×q : Πp · Q∗ = Q
Πp · Q = Q ∗ ,
holds.
Theorem 4.4.1. Let Qp and Qq being left Π-real, square and nonsingular:
φ : M → Q−1
p MQq = R = φ(M).
φ maps bijectively any Centro-Hermitian matrix M onto a real matrix R = Q−1

p MQq .
Proof Let us take the conjugate of φ(M):
∗
→: (φ(M))∗ = Q−1
p MQq = Q−1,∗
p M∗ Q∗q
= Q−1,∗ Π MΠ Q∗ = Q−1,∗ Π MΠq Q∗q
p
| p {z }q q | p {z }p | {z }
M∗ Q−1 Qq
p
= Q−1
p MQq
= φ(M ) = R ∈ Rp×q
←: φ−1 (R) = Qp RQ−1q

∗
Πp φ (R) Πq = Πp Qp RQ−1,∗
−1 ∗
Πq
| {z } | q {z }
Qp Q−1
q
= Qp RQ−1
q
= φ−1 (R) = M. q.e.d.
Assume now, Qp and Qq are not only left Π-real, square and non-singular, but also unitary,
then a real-valued SVD results in
SVD
φ(M) = R = QHp MQq = EΣφ FH ,
while a complex-valued SVD of M leads to

SVD
M = φ−1 (EΣφ FH ) = UΣVH = Qp EΣφ FH QHq
| {z }
R
H
= (Qp E) · Σφ · (Qq F) ,
and therefore, we have
U = Qp E
Σ = Σφ
V = Qq F.
More specific, let us choose Qp and Qq as follows

• for p, q even:
1 1n j · 1n
Q2n =√ · ,
2 Πn −j · Πn
• for p, q odd:  
1n √0 j · 1n
1
Q2n+1 = √ ·  0T 2 0T ,
2 Πn 0 −j · Πn
which both are left Π-real, square, nonsingular and unitary!
Next, let us transform our array measurement matrix X ∈ CM ×N into a centro-Hermitian
matrix Z.

Z = X, ΠM X∗ ΠN ∈ CM ×2N .
Checking now, if it is centro-Hermitian:

∗
∗ 0 ΠN
ΠM Z Π2N = ΠM X , ΠM XΠN
ΠN 0

∗
0 ΠN
= ΠM X , XΠN ·
ΠN 0

= X, ΠM X∗ ΠN = Z.
Now, we apply our mapping φ onto the centro-Hermitian array measurement Z.

φ(Z) = φ( X, ΠM X∗ ΠN )

= QHM X, ΠM X∗ ΠN Q2N
= T (X)
T (X) ∈ RM ×2N .
Assume now M an even number, i.e. M = 2n and let us partition X into two submatrices

X1
X= ∈ C2n×N ,
X2
with X1 ∈ Cn×N and X2 ∈ Cn×N . Now use our specific choice of Qp and Qq :

T (X) = QHM X, ΠM X∗ ΠN Q2N
∗
1 1n Πn X1 0 Πn X1 1 1n j1n
= √ · , · · ΠN · √
2 −j1n jΠn X2 Πn 0 X∗2 2 Πn −jΠn
∗

1 1n Πn X1 Π n X2 Π N 1n j1n
= · ∗ · .
2 −j1n jΠn X2 Π n X1 Π N Πn −jΠn
Finally, we arrive at

1 X1 + Π n X2 , Πn X∗2 ΠN + X∗1 ΠN 1n j1n
T (X) = ·
2 −jX1 + jΠn X2 , −jΠn X∗2 ΠN + jX∗1 ΠN Πn −jΠn

1 X1 + Πn X2 + Πn X∗2 + X∗1 , jX1 + jΠn X2 − jΠn X∗2 − jX∗1
=
2 −jX1 + jΠn X2 − jΠn X∗2 + jX∗1 , X1 − Πn X2 − Πn X∗2 + X∗1

Re {X1 + Πn X∗2 } , −Im {X1 − Πn X∗2 }
= ∈ RM ×2N .
Im {X1 + Πn X∗2 } , Re {X1 − Πn X∗2 }
Therefore, after a lengthy derivation we have arrived at a simple and beautiful result: we have
a real-valued matrix, which we obtain from the complex-valued measurement by really simple
calculations (no actual multiplication needed!). Now we can perform the computationally simpler
real-valued SVD of T (X) and relate it to the complex valued SVD of Z:
T (X) = φ(Z) = E · Σ · FH
Z = U · Σ · VH = QM · E · Σ · FH · QH2N
H
Σs 0 Fs
= Q M · Es , Eo · · · QH2N
0 Σo FHo
H
Σs 0 Vs
= Us , Uo · ·
0 Σo VoH
⇒ Us = QM Es or Es = QHM Us .
4.4.3 Real Valued Invariance Equation

From Standard ESPRIT we know
X = AS + N = Us Σs VsH + Uo Σo VoH
J1 AΦ ≈ J2 A,
image{X} = image{Us } = image{A},
therefore,
A = U s · TA ,
where TA is square and nonsingular. Hence,
J1 U s T A Φ ≈ J2 U s TA
J1 Us Ψ ≈ J2 U s ,
where
Ψ = TA ΦT−1
A .
Now let us define a matrix D

D = QHM A → A = QM D
J1 QM DΦ ≈ J2 QM D | → QHm
QHm J1 QM · DΦ ≈ QHm J2 QM · D,
where J1 , J2 ∈ Rm×M

J1 = 1m , 0

J2 = 0, 1m
⇒ J1 = Π m J2 Π M .
In addition, it is easy to show that

1
m M 1
z }| { z }| {
H
Qm Πm Πm J2 ΠM ΠM QM = QHm Πm Πm J2 ΠM ΠM QM
| {z } | {z } | {z }
Q∗,H J1 Q∗M
m
∗
= QHm J1 QM
= QHm J2 QM .
Let us split QHm J2 QM into its real and imaginary part
1
QHm J2 QM = (K1 + jK2 ) ,
2
then it follows that
1
QHm J1 QM = (K1 − jK2 ) ,
2
with

K1 = QHm (J1 + J2 ) QM = 2Re QHm J2 QM ∈ Rm×M
1 H
K2 = Qm (J2 − J1 ) QM = 2Im QHm J2 QM ∈ Rm×M .
j
Therefore,
(K1 − jK2 )DΦ ≈ (K1 + jK2 )D

K1 D(Φ − 1) ≈ jK2 D(Φ + 1)
1
K1 D (Φ − 1)(Φ + 1)−1 ≈ K2 D.
j
| {z }
Ω
The matrix Ω can be simplified to

d
1 1 φi − 1
Ω = (Φ − 1) (Φ + 1)−1 = diag ·
j | {z } | {z } j φi + 1 i=1
diagonal diagonal
 d
jµi
e −1
d  ej µ2i − e−j µ2i 
= diag = diag
j(ejµi + 1) i=1
 j ej µ2i + e−j µ2i 
i=1
µi d
sin 2
n µ i od
= diag µi = diag tan .
cos 2 i=1
2 i=1
Since D = QHM A and Es = QHM Us and A = Us TA we have
D = QHM Us TA = Es TA
K 1 E s TA Ω ≈ K 2 E s TA ,
and therefore
K1 Es (TA ΩT−1 ) ≈ K 2 Es
| {z A }
Υ
K 1 Es Υ ≈ K 2 Es ,
which is the real-valued Invariance Equation!
Look at a simple example how the new selection matrices, which can be computed off-line
look like:
Example 4.4.2. Uniform Linear Array with M = 6 antennas, overlapping sub-arrays with m = 5
antennas each.
 
1 0 0 0 0 0
 0 1 0 0 0 0 
 
J1 =   0 0 1 0 0 0 

 0 0 0 1 0 0 
0 0 0 0 1 0
 
0 1 0 0 0 0
 0 0 1 0 0 0 
 
J2 =   0 0 0 1 0 0 

 0 0 0 0 1 0 
0 0 0 0 0 1
K1 = QHm · (J1 + J2 ) · QM
 
    1 0 0 j 0 0
1 0 0 0 1 1 1 0 0 0 0
 0 1
 0 1 0 0 j 0 
√0 1 0
  0 1 1 0 0 0   
1     1  0 0 1 0 0 j 
= √  0 0
 2 0 0 · 0 0 1 1 0 0 · √  
2  −j 0  
  0 0 0
 2 0 0 1 0 0 −j 
0 0 j 1 1 0   
 0 1 0 0 −j 0 
0 −j 0 j 0 0 0 0 0 1 1
1 0 0 −j 0 0
 
1 1 0 0 0 0
 0 1
√1 0 0 0 


= 
 0 0 2 0 0 0  ,
 0 0 0 1 1 0 
0 0 0 0 1 1
and
1 H
K2 = Q · (J2 − J1 ) · QM
j m
 
0 0 0 −1 1 0
 0
 0 0 0 −1 √1 

= 
 0 0 0 0 0 2 .

 1 −1 0 0 0 0 
0 1 −1 0 0 0

There is still one question open: is the columnspace spanned by X and the one spanned by
Z the same? Because only then our transformation of the array measurement X to the centro-
Hermitian matrix Z will not change the estimated directions of arrival (DoA’s).
To assure that, we have to use centro-symmetric arrays, which are characterized by the follow-
ing equation:
ΠM A∗ = A∆,
where A is the steering matrix and ∆ being a diagonal and unitary matrix.
For a ULA this holds, as we will see:
ΠM A∗ULA = AULA ∆ULA
with
d
∆ULA = Φ−(M −1) = diag e−j(M −1)µi i=1
.
Therefore, we can compute

 
1 1 ··· 1  
jµ1 jµ2 jµd e−j(M −1)µ1 0 ··· 0
 e e · · · e 
−j(M −1)µ2

j2µ
  0 e · ·· 0 
AULA ∆ULA =  e 1
 ej2µ2 · · · ej2µd  
· .. .. .


 .. .. ..   . . .. 0 
 . . ··· .  −j(M −1)µd
0 0 ··· e
ej(M −1)µ1 ej(M −1)µ2 · · · ej(M −1)µd | {z }
| {z } ∆ULA
AULA
 
e−j(M −1)µ1 e−j(M −1)µ2 · · · e−j(M −1)µd
 e−j(M −2)µ1 e−j(M −2)µ2 · · · e−j(M −2)µd 
 
= 
 .
. .
. .
. 
 . . · · · . 

jµ jµ jµ
 e 1
e 2
···e d 
1 1 ··· 1
∗
= ΠM AULA q.e.d.
Therefore, ULA’s are centro-symmetric and now image{A} = image{A∆} because ∆ is

diagonal but A∆ = ΠM A∗ and hence,
image{ΠM A∗ } = image{A} = image{Us },
and the augmentation of X by ΠM X∗ ΠN has not changed the subspace!

Let us summarize, what we have achieved. We have replaced the complex-valued subspace
estimation of a M × N matrix by a real-valued one of a M × 2N matrix. This reduces the number
of real-valued multiplications by a factor of two. Since the subspace estimation is the most involved
step of ESPRIT, this is a substantial reduction. In addition, UNITARY ESPRIT incorporates what
is called forward-backward averaging (mapping from X to Z) which increases the accuracy of the
estimated DoA’s.
Just as for the Standard ESPRIT, let us now summarize the three steps for Unitary ESPRIT
1) Signal Subspace Estimation: Compute Es ∈ RM ×d either via the
• Square root approach or direct data approach: From the SVD of
H
Σs 0 Fs
T (X) = Es Eo · · ∈ RM ×2N , N > M,
0 Σo FHo
by taking the the d dominant left singular vectors of T (X). or via the
• Covariance Approach: From the EVD of
H
H Λs 0 Es
T (X) (T (X)) = Es Eo · · ∈ RM ×M ,
0 Λo EHo
as the d dominant eigenvectors of T (X) (T (X))H

2) Solution of the Invariance Equation: Then solve,
K 1 Es Υ ≈ K 2 Es ,
| {z } | {z }
Rm×d Rm×d
by means of Least Squares (LS), Total Least Squares (TLS) or Structured Least Squares (SLS).
3) Spatial Frequency (DoA) Estimation: Compute the eigenvalues of the resulting real-valued
solution Υ
Υ = TA ΩT−1
A ∈ R
d×d
, with Ω = diag{ωi }di=1 .
Afterward
• Reliability Test: If all eigenvalues ωi are real, the estimates will be reliable. Otherwise,
start again with more measurements.
• If all eigenvalues ωi are real then
µi = 2 arctan ωi

λ
θi = arcsin − µi .
2π∆
5. Signal Reconstruction
In many cases not the DoA’s but the wavefronts, i.e. their complex envelopes (signals), are of
interest. Consider the case when we obtain an estimate of the signals or wavefronts by applying a
linear filter WH ∈ Cd×M on the received data X, i.e. the estimate of the signals is given by
Ŝ = [ŝ(t1 ), ŝ(t2 ), · · · , ŝ(tN )] = WH · X,
and let us recall that the data model is
X = AS + N,
where A now is assumed to be known from the DoA estimation! To this end, let us now consider
different alternatives for computing the receive filter WH :
1) LS Solution (Least Squares)
2) MVDR Solution (Minimum Variance, Distortionless Response),
3) MMSE Solution (Minimum Mean Square Error),
4) MF Solution (Matched Filter).
5.1 LS Solution
We assume no statistical knowledge about S and N and that
X ≈ AS,
hence
Ŝ = argmin kX − ASk2F .
S
The solution to this optimization problem is obtained with a pseudo-inverse
AH X = AH AŜ
(AH A)−1 AH X = (AH A)−1 AH AŜ
Ŝ = (AH A)−1 AH X = A+ X
= WH X.
Therefore, the linear reconstruction filter optimal in the aforementioned sense is with
WLSH
= A+ ,
H
WLS A = 1d .
71
72 5. Signal Reconstruction
The estimate of the signals is then
Ŝ = WH (AS + N)
= S + WH N,
which is an unbiased estimate of S.
5.2 MVDR Solution

Again let us constrain that
H
WMVDR A = 1d ,
i.e. we desire a distortionless response leading to an unbiased estimate, but additionally we would
also like to minimize the total output power since that also minimizes the noise power (variance),
hence the name minimum variance distortionless response. We consider the d individual outputs
of our reconstruction filter by partitioning the filter matrix into beamforming vectors
 
w1H
 w2H 
 ∈ Cd×M .
H  
WMVDR = ..
 . 
wdH
Let us set up our optimization problem

wi,MVDR = argmin E |ŝi [n]|2 , s.t. wiH · A = eTi ,
wi
where ei ∈ {0, 1}d is an all-zero vector with a 1 only at the i-th entry.
Thus, we have
ŝi [n] = wiH · x[n]

= wiH (As[n] + n[n])
= eTi s[n] + wiH n[n]
= si [n] + wiH n[n].
Therefore, we can find the MVDR solution from either one of two optimization problems

(∗) wi,MVDR = argmin E |wiH n[n]|2 = argmin wiH Rnn wi
wi wi
(∗∗) wi,MVDR = argmin E[|wiH (As[n] + n[n]) |2 ] = argmin wiH Rxx wi .

wi | {z } wi
x[n]
where
Rxx = ARss AH + Rnn .
5.3 MMSE Solution 73
From (∗∗), we obtain with the Lagrangian function

L(wi , λi ) = wiH Rxx wi + λHi AH wi − ei + wiH A − eTi λi
∂L !
= Rxx wi + Aλi = 0
∂wi∗
⇒ wi = −R−1
xx Aλi
∂L !
= AH w i − ei = 0
∂λ∗i
⇒ −AH R−1
xx Aλi = ei
−1
⇒ λi = AH R−1
xx A ei
−1
wi,MVDR = R−1 H −1
xx A A Rxx A ei
−1
H
wi,MVDR = eTi AH R−1
xx A AH R−1
xx .
Hence from (**) we have

−1
H
WMVDR = AH R−1
xx A AH R−1
xx .
Similarly, working with (*) leads to

−1
H
WMVDR = AH R−1
nn A AH R−1
nn .
MVDR is a zero forcing (ZF) solution, completely suppressing interference between impinging
wavefronts and minimizing the noise.
Example 5.2.1. Special case (i.i.d. noise):
Rnn = σn2 · 1M
Then the MVDR solution is
−1
H H 1 1
WMVDR = A 2 1M A AH 2 1 M
σn σn
H
−1 H
= A A A
+
= A
H
= WLS .
Hence if the noise is i.i.d. ⇒ MVDR solution and LS solution are identical!
5.3 MMSE Solution

For 1 ≤ i ≤ d, we get for the mean square error (MSE)
ei [n] = si [n] − ŝi [n] = si [n] − wiH x[n]

E |ei [n]|2 = E si [n] − wiH x[n] s∗i [n] − xH [n]wi

= E |si [n]|2 − wiH x[n]s∗i [n] − si [n]xH [n]wi − wiH x[n]xH [n]wi
= σs2 − wiH pi − pHi wi + wiH Rxx wi
= J(wi ).
74 5. Signal Reconstruction
Deriving J(wi ) with respect to wi∗

∂J(wi )
= −pi + Rxx wi = 0
∂wi∗
Rxx wi = pi ,
for each i = 1, · · · , d. For all i = 1, · · · , d, we can collect each of the previous expressions to
obtain
Rxx W = Rxs ,
where
Rxs = [p1 , p2 , · · · , pd ] ∈ CM ×d ,
W = [w1 , w2 , · · · , wd ] ∈ CM ×d .
Hence,
WMMSE = R−1
xx Rxs .
Note that

Rxs = E x[n]sH [n] = E (As[n] + n[n]) · sH [n]

= AE s[n]sH [n] = ARss .
Therefore, we can express the MMSE solution as

H
−1
WMMSE = Rss AH ARss AH + Rnn .
Applying the matrix inversion lemma, we can obtain

−1
H
WMMSE = AH R−1 −1
nn A + Rss AH R−1
nn .
In the high SNR regime, the MMSE solution converges to the MVDR solution.
5.4 MF Solution
ŝi = wiH xi ,
xi = a(µi )si + n
ŝ = wiH a(µi )si + wiH n,
2 i ∗
E |ŝi | = E wiH a(µi )si + wiH n wiH a(µi )si + wiH n

= wiH a(µi )aH (µi )wi E |si |2 + wiH Rnn wi
wiH a(µi )aH (µi )wi E [|si |2 ]
SNRi = .
wiH Rnn wi
If
Rnn = σn2 · I

E |si |2 = σs2 ,
5.4 MF Solution 75
then we have that

σs2 wiH a(µi )aH (µi )wi
SNRi = .
σn2 wiH wi
The matched filter solution is computed from
wi,MF = argmax SNRi .

wi
Hence,
wi,MF = a(µi ),
so
W H = AH .
Again, we see that in the low SNR regime (i.e. σs2 ≪ σn2 ) the Wiener solution (MMSE)
converges to the matched filter solution, while in the high SNR regime it converges to the MVDR
(ZF) solution.
6. Downlink Beamforming
Consider the downlink of a single cell with M transmit antennas at the base station and with K
single-antenna users as depicted in Fig. 6.1. We denote sk (t) as the signal of user k at time instance
t. Additionally, we assume that there are Qk number of paths from the base station to user k. The
spatial frequency, the gain and the delay of path qk of user k are denoted by µk,q , bk,q and τk,q .
Let us assume that the base station knows the angles of arrivals of the users after performing
angle of arrival estimation in the uplink, through MUSIC or ESPRIT for instance. Based on reci-
procity we can assume that the angles of departure θ from the base station to the users in the down-
link are the same as the angles of arrival. Through downlink beamforming, the beamforming vector
wk ∈ CM is employed to transmit to user k. In the following we employ ak,q = a(µk,q ) = a(θk,q )
for the array steering vector of the angle of departure θk,q .
From Fig. 6.1, we have that the received signal xk [n] of user k at time slot n, is given by
Qk K
!
X X
xk [n] = wlH sl (nT − τk,q ) · bk,q · ak,q + nk [n] + ik [n]
q=1 l=1
K Qk
X X
= wlH sl (nT − τk,q ) · bk,q · ak,q + nk [n] + ik [n],
l=1 q=1
where nk [n] is the noise at user k at time slot n and ik [n] is the intercell interference at user k at
time slot n. We assume that sk [n], nk [n], and ik [n] are mutually uncorrelated. Additionally, we
denote the power of the noise and the interference at user k as Nk and Ik , respectively, i.e.

E |nk [n]|2 = Nk

E |ik [n]|2 = Ik [n].
Therefore,
 ! !H 
K Qk K Qk
X X X X
2
E |xk [n]| = E wlH sl (nT − τk,q )bk,q1 ak,q1 whH sh (nT − τk,q2 )bk,q2 ak,q2 
l=1 q1 =1 h=1 q2 =1
+ Nk + Ik [n].
76
1 1
2 2
s1 [t] w1∗ .. .. 1 µk,1 , bk,1 , τk,1
. . Σ1
M K
θk,1
1 1 User k
2 2
s2 [t] w2∗ .. .. 2
. . Σ2
µk,2 , bk,2 , τk,2
M K θk,2
xk [n]
..
.. .. ..
6. Downlink Beamforming
θk,Qk µk,Qk , bk,Qk , τk,Qk
1 1
2 2
sK [t] ∗
wK .. .. M
. . ΣM
M K
Fig. 6.1. Downlink Beamforming
77
78 6. Downlink Beamforming
P PQk P K PQk
In the double sums K l=1 q1 =1 and h=1 q2 =1 , only the terms with l = h contribute to the
expectation, because sl and sh l 6= h (signals bound for different users) are uncorrelated:
K
" Qk Qk
#
X X X
2
E |xk [n]| = wlH E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 ak,q1 aHk,q2 wl + Nk + Ik [n]
l=1 q1 =1 q2 =1
K
X
= wlH Rkl wl + Nk + Ik [n],
l=1
with
Qk Qk
X X
Rkl = aHk,q2 ak,q1 · E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 .
q1 =1 q2 =1
Note that,

0 6 q2
for q1 =
E sl (nT − τk,q1 )s∗l (nT − τk,q2 )bk,q1 b∗k,q2 = ,
σl2 · E [|bk,q |2 ] for q1 = q2 = q
since distinct path gains for a given user are uncorrelated. We assume that σl2 = 1 ∀ l = 1, . . . , K.
Then,
Qk
X
Rkl = Rk = ak,q aHk,q · E |bk,q |2 ,
q=1
i.e., it is independent of l.
The SINR for user k reads as
wkH Rk wk
SINRk = K
.
X
wlH Rk wl + Ik + Nk
|{z}
l=1,l6=k |{z}
| {z } Intercell- Noise
Intracell-
Interference
Note that the transmit power assigned to user k is given by Pk = kwk k22 , since we have assumed
σk2 = 1.
The problem that we will consider in order to compute theP beamforming vectors wl for l =
1, . . . , K would be to minimize the total transmit power PT = K l=1 Pk subject to guaranteeing a
specified SINRk for every user k.
Therefore, we consider
K K
X X wkH Rk wk
min Pk = min kwk k22 s.t. SINRk = PK k = 1, . . . , K,
wk
k=1
wk
k=1 l=1,l6=k wlH Rk wl + Ik + Nk
i.e. we have K quadratic constraints
K
!
X
SINRk · wlH Rk wl + Ik + Nk = wkH Rk wk k = 1, . . . , K.
l=1,l6=k
This is a tough optimization problem: quadratic optimization with quadratic constraints!

6.1 First Step 79
We propose a linearized version (Linearized Power Minimizer (LPM)) by

1) First, choose beamforming vectors with unit length such, that the power bound for user k is
optimized for every user k = 1, . . . , K and
2) Second, that the norm of the beamforming vectors is adjusted such, that every user gets the
SINR he needs (if possible).
6.1 First Step

Writing the first step mathematically, we have
max wkH Rk wk ∀ k = 1, . . . , K.
wk
The solution of this problem is obvious1 : the eigenvector of Rk , which corresponds to the largest
eigenvalue of Rk will maximize the given quadratic form.
Let us designate this eigenvector by uk , with kuk k22 = 1.
6.2 Second Step

We obtain the beamforming vector wk by scaling the eigenvector obtained in the first step by the
power assigned to user k p
wk = Pk · uk .
The K constraint equations
K
!
X
SINRk · wlH Rk wl + Ik + Nk = wkH Rk wk k = 1, . . . , K,
l=1,l6=k
can be rearranged in the following way:

K
X
Pl uHl Rk ul
Pk uHk Rk uk l=1,l6=k
− = 1, k = 1, . . . , K.
SINRk (Nk + Ik ) Nk + I k
Hence we can set up the K constraint equations in the following matrix-form:

 
uH1 R1 u1 uH2 R1 u2 uHK R1 uK
 SINR (N + I ) − ··· −  
 1 1 1 N1 + I 1 N1 + I 1  P1
  
1
H H H
 u1 R2 u1 u2 R2 u2 uK R2 uK 
− ··· −   P2   1 
     
 N2 + I 2 SINR2 (N2 + I2 ) N2 + I 2
  ·  ..  =  ..  .
 .. .. ... ..   .   . 
 . . . 

 | P{zK }
 1
 uH1 RK u1 uH2 RK u2 uHK RK uK
− − ··· | {z }
NK + I K NK + I K SINRK (NK + IK ) P ∈ R+ 1 ∈ {1}K
K
| {z }
Ψ ∈ RK×K
1
We have already solved a similar problem in Section 2.6.
80 6. Downlink Beamforming
Hence we have now a problem with K linear equations and K unknows, which we can solve as
follows:
Ψ·P = 1
P = Ψ−1 · 1 ∈ RK
+.
However, note that only if all components of P are positive, this is a valid solution! In addition, if
PTmax is the maximum available transmit power, only if
K
X
Pl = kPk1 ≤ PTmax ,
l=1
this solution can be implemented! If one of the two conditions is not fulfilled, then at least
one
PK−1 of the K users has to be taken out. If the reduced problem then has a valid solution and
l=1 Pl < PTmax , then the scheduling algorithm can try to another user, if there are some
waiting for service.
Appendix
A1 Subspaces of a Matrix
Rn Rm
xr xr → Axr = Ax Ax
row space column space
R(AH ) R(A)
x → Ax
x
0 0
nullspace xn → Axn = 0 left nullspace

N (A) xn N (AH )
Fig. A1. The action of matrix A. Figure taken from [2].
81
82 Appendix
Rm Rn
p = Pb x + = A+ p = A+ b x + = A+ b
column space row space
R(A) R(AH )
A+ b = p
b
0 0
left nullspace A+ (b − p) = 0 nullspace

N (AH ) b − p
N (A)
Fig. A2. The action of the pseudoinverse matrix A+ . Figure taken from [2].
Bibliography
[1] Simon Haykin. Adaptive Filter Theory. Prentice Hall, 1996.

[2] Gilbert Strang. Linear Algebra and its Applications. Harcourt Publishers Ltd., 1988.
83

aasp15book

Uploaded by

Copyright:

Available Formats

aasp15book

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

aasp15book

Uploaded by

Copyright:

Available Formats

Adaptive and Array Signal Processing

Univ.-Prof. Dr. techn. Dr. h. c. Josef A. Nossek

October 12, 2015

Technische Universität München

1. Introduction and Motivation 1

4. High Resolution Direction-of-Arrival (DoA) Estimation 55

4.4.1 Centro-Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 63

u[n] y[n] −d[n]

Fig. 1.1. Adaptive Filter

1.1 Application Areas

1.1.1 System Identification

Fig. 1.2. System Identification

1.1.2 Inverse Modeling

System u Adaptive System

Fig. 1.3. Inverse Modeling

1.1.4 Interference Cancellation

Fig. 1.4. Prediction

Fig. 1.5. Interference Cancellation

Echo of B’s speech Line 1

r̂[n] Speaker A’s

Fig. 1.7. Signal definitions for echo cancellation

Adaptive Noise Canceller

1.2 Adaptive Equalization

1.2 Adaptive Equalization

1.3 Single Channel Adaptive Equalization

u[n] u[n − 1] u[n − M + 2] u[n − M + 1]

w0∗ w1∗ ··· ∗

For the filter output we can write

and its complex conjugate is then given by

Find a w such that ke[n]k22 is minimal!

With e∗ = d∗ − UH w we have that the squared norm of this error vector is

1.4 Multichannel Adaptive Beamforming

Array of Sensors Adjustable Weights

Steering vector a(θ)

We have the input vector

and the filter vector  

The output is given by

Let us collect again N + 1 snapshots in one output vector of the beamformer,

where we have substituted  

d[n] = wH · d[n] = wH · a(θd ) · d[n] = d[n],

and the collection of N + 1 samples of the output is

where recall that U is given by (1.20). The power at the output is

ky[n]k22 = yH [n] · y[n] = y[n]T · y∗ [n]

with R = UUH . Therefore, the problem can be stated as follows:

min ky[n]k22 = min wH Rw, subject to wH · a(θd ) = 1.

where x = [x1 , x2 , . . . , xn ]T and a = [a1 , a2 , . . . , an ]T and we have that

2.2 Differentiation with respect to a Complex Vector

We can write the function either as a function of z, or alternatively as a function of x and y or

Let us pick a simple example for illustration:

Look at the problem in Section 1.3

2.3 Quadratic Optimization with Linear Constraints

where w ∈ CM , c, g ∈ CK , SH ∈ CK×M K < M . In order to accommodate the constraints, we

First we take the derivative with respect to w∗

Next we differentiate with respect to λ∗

From where we have

wopt = S(SH S)−1 g = (SH )+ g,

2.4 Stochastic Processes

Table 2.1. General Definitions of Mean, Autocorrelation and Autocovariance

c[n, n − k] = E [(u[n] − µ[n]) (u[n − k] − µ[n − k])∗ ]

For k = 0 we have that