Gi Part1 PDF

Geophysical Inversion
Vector Space:
It is a mathematical structure formed by a collection of elements called ‘vector’.

Vector may be added and multiplied by scales.
For example,   
F =x+y
Where, F, x, y are the vectors.
There are also vector spaces with scalar multiplication by complex number,
rational number.
From the linear algebra point of view, vector spaces are characterised by their
dimension. Dimension is defined as the number of independent directions in
spaces.
Example:
If we record one seismic trace, one second in length at a sample rate of 1000
samples per second, and let each sample be defined by one byte, then we can
put these 1000 bytes of information in 1000
(s1, s2, s3, ..................., s1000)
si is the ith sample. Consider that it is a 3- component physical vector. While

stacking seismic traces, we just add these n-dimensional vectors component,
say,
s+t = (s1+t1, s2+t2, ............................., s1000+t1000)
Note: Mathematical definition of vector space is sufficiently general to

incorporate objects like, functions, matrices.
Hilbert space:
Hilbert space is names after David Hilbert. A Hilbert space is an abstract vector
space possessing the structure of an inner product that allows length and angle
to be measured. One of the most familiar examples of Hilbert space is the
Euclidean space consisting of three- dimensional vectors, defined by R3,
equipped with the dot product.
Example:
The dot product takes two vectors x & y and produces a real number x .y
In Cartesian coordinates, the dot product is defined by
(x1 , x2 , x3) . (y1, y2, y3) = x1y1 + x2y2 + x3y3
The dot product satisfy the following properties
(1). It is symmetric in x & y (i.e. x . y = y . x)
(2). It is linear in its first arguments (i.e. (ax1 + bx2) y = ax1. y + bx2. y ; where a,
b are the scalars and x1 , y1 are the vectors)
(3). It is positive defined for all vectors x ; x ≥ 0 with equality if and only if
x=0.
Note:
Dot product satisfies these three properties is known as a inner product. Every
finite – dimensional inner product space is also a Hilbert space.
In order to define the relative ‘size’ of vectors matrices a generalized concept of

length is introduced which is called norm.
Norm:
A norm is a function form the space of vectors onto the scalar, defined by ‖•‖
satisfying the following properties for any two vectors
 
u & v and any scalar α
(1).‖ v‖ > 0 for any v≠ 0 and | v |=0 ↔ v=0
(2). ‖ α v ‖ = | α | ‖ v ‖
(3). ‖v +u ‖ ≤ ‖v‖ + ‖u‖ [Triangle property]

The most useful class of norms for vectors in Rn is the lp norm defined for p≥ 1
1
 n p
 p
x = ∑ x 
l p  i =1 i 
 
For p= 2 this is just the ordinary Euclidean norm: x 2 = x T x
lp norm exists as p→ ∞ called l∞ norm

x lp
= max xi ; 1≤ i ≤ n
Note:
A matrix norm that is not induced by any vector norm is the Frobenius norm
defined for all A ε Rn*m
1
 m n  2
A F
=  ∑∑ Aij2 
 i =1 j =1 
Interpretation:
Which p is best for optimization????
It is observed that p values near 1 are more stable than p- values near 2. In
inversion, if our data have say, Gaussian distribution, the l2 is optimal. If our
data have double- exponential distribution, the lp is optimal.
Figure
 − 1 x − x0
p
1− 1
p
p

ρ p ( x) = exp 
2σ p Γ ( 1 )  p (σ p ) p 
p  
Γ = Gamma function
ρ(x) = Probability density
σp = Dispersion
∞ p
(σ p ) ≡ ∫ x − x0 ρ ( x ) dx
p
−∞
Dimension:
The dimension of vector space V is the cardinality (ie. number of vector / size
of the sets) of a basis of V.
Basis is a set of linearly independent vector which can represent every vector in
a given space / (or in coordinate system)
Note:
In physics / mathematics the dimension of a space is defined as the minimum
number of coordinate needed to specify any point without Gaussian
distribution.
1  (d − d )2 
P(d ) = σ exp − 
2π  2σ 2 
d = variable and <d> is the mean variable.
Matrices:
A matrix is a rectangular array of numbers, symbols, or expression, arranged in
rows and columns, the individual items in a matrix are called its elements or
entries.
2 5
A=
3 8


1 0

The components are denoted by Aij. The transpose of a matrix, denoted by AT
2 3 1
AT = 
5 8 0

Prove that (AB)T = BT AT , AB ≠ BA for non-square matrices
Symmetric matrix:
A matrix which equals its transpose i.e. AT = A is said to be symmetric.
Skew – symmetric:
If AT = -A, the matrix is said to be skew- symmetric.
Split / Partition:
Any square matrix A can be portioned into a sum of a symmetric and a skew-
symmetric part via
A=
1
2
(
A + AT +
1
2
)
A − AT ( )
Hermitian matrix:
Hermitian matrix is a square matrix with complex entries such that
Aij = Aji
3 + i 5 − 2i 
A=
2 − 2i i − 7 − 13i 

3 + 2i 2 − 2i 
AT =  
 5 i 

 − 2i − 7 − 13i 

3 − i 2 + 2i 
∗
A orA H
= −i 
5 

2i − 7 + 13i 

Note:
Hermitian: A = A*
Skew- Hermitian: A = -A*
Normal: A*A = AA*
Unitary: A* = A-1
Orthogonal matrix:
AT = A-1
i.e. ATA = AAT = I
I = identity matrix
Diagonal matrix:
If all entries outside the main diagonal are zero, A is called diagonal matrix.
a11 0 0 
A=
 0 a22 0 

 0 0 a33 

Square matrix:
Square matrix is matrix with same number of rows and columns.
Lower triangular matrix:
Entries above the main diagonal are zero.
 a11 0 0 
A=
 a21 a22 0 

 a31 a32 a33 

Upper triangular matrix:
Entries below the main diagonal are zero.
a11 a12 a13 
A=
 0 a22 a23 


 0 0 a33 

Invertible matrix:
A square matrix a is called invertible or non- singular if there exists a matrix B
such that AB = BA = In
If B exists, it is unique and is called the inverse matrix of A, denoted by A-1.
Note:
A matrix is invertible if and only if its determinant is non- zero.
a b
det  = ad − bc
c d

Rank of a matrix:
The rank of a matrix A is the maximum number of linearly independent row
vectors of the matrix, which same as the maximum number of linearly in
dependent column vectors.
Example:
 1 2 2
A = −
− 2 −3 1


 3 5 0

Rank = 2
First two rows are linearly independent. So that the rank is at least 2 but all
three rows are linearly dependent. (First is equal to the sum of the second &
rank must be less than 3).
1 1 0 2 
A=
− 1 −1 0 − 2
 Has rank 1.
Eigen values and Eigenvectors:

A number λ and a non-zero vector x satisfying
Ax=λx
Where number λ must be chosen such that (A – λI) has a null space so that
Det(A – λI) = 0 [we must choose x so that it lies in that null space]
This determinant is a polynomial in λ, called the characteristics polynomial.
Example:
5 3
A=
4 5

Characteristics polynomial is
λ2 - 10λ +13 = 0
Where roots are λ = (5 + 2√3) & (5-2√3)
This leads to solve two homogeneous systems

2 3 3   x1 
   = 0
 4 2 3   x2 
− 2 3 3   x1 
   = 0
 4 − 2 3   x2 
From which we arrive at two Eigen vectors,

 3  − 3 
 2 ,  2 

 1 
  1 

Moore – Penrose Matrix Inverse:
Given an m by n matrix A, the Moore – Penrose generalized matrix inverse is a
uniquely n by m matrix pseudo inverse A+. This matrix was independently
defined by Moore in 1920 and Penrose (1955), and variously known as the
generalized inverse, pseudo inverse, or Moore- Penrose.
The Moore- Penrose inverse satisfies
(1). AA+A = A
(2). A+AA+ = A+
(3). (AA+)H =AA+
(4). (A+A)H = A+A
Where AH is the conjugate transpose.
It is also true that

y = A+ b
Is the shortest least square solution to the problem Ay = b ................ (6)
If the inverse of A+ = (AHA)-1AH
As we can seen by premultiplying both sides of (6) by AH to create a square
matrix which can then be interval?
AHAy = AH b
Giving y = (AHA)-1 AH b
= A+ b
Orthogonal decomposition of a real symmetric matrix:
A real symmetric matrix G can be factored into
G = UQUT
Where orthogonal eigenvectors in U and real Eigen values in Q.
Note:
For dimensional reasons there is clearly no hope of the kind of eigenvector
decomposition discussed above being applied to rectangular matrix. If we allow
different orthogonal matrix on each side of G.
Geophysical Inversion:
Introduction:
In the geophysical and the related science, experiments are performed under
controlled conditions (ie. in a systematic manner), the outcome may be
numerical values: that represent our observations at fixed (predetermined
intervals) say, so the observation of the some physical properties of the physical
world are commonly referred to as the experimental / observational data. In
order to explain the observational data, it is required to understand the
relationship between the distribution of properties of the physical system under
study (e.g. earth) and observable geophysical response.
System of equations → described the relationship is FORWARD THEORY
Inverse theory is an organised set of mathematical statistical techniques (e.g.

calculus, matrix algebra, statistical estimation & inference etc) for retrieving
useful information about the physical system (physical world) from controlled
observations on the system. It is directly relate to:
(1). Analysis of experimental data.
(2). Fitting of mathematical model to estimate the model parameter.
(3). Optimal experimental design.
Examples of problems where inverse theory is used:
(i). Curve fitting.
(ii). Digital filter design / De-convolution of seismogram.
(iii). Determination of earth structure / ore deposits / energy resource form
geophysical information.
(iv). Determination of earthquake location wave arrival times.
Geophysical processes & Systems:
Geophysical Processes:
Seismic and EM wave propagation through the earth & current or fluid flow (in
porous) rocks.
Geophysical systems:
(1). Density distribution within the Earth.
(2). Velocity distribution within the Earth.
(3). Temperature distribution within the Earth.
(4). Resistivity distribution within the Earth.
(5). Distribution radioactive materials within the Earth.
(6). Magnetic susceptibility variation within the Earth.
Geophysical explanations philosophy & inverse theory:

Goal of explanation of geophysics:
Understand / reconstruct the structure of the earth from recorded data. (above /
below the earth).
Need to data:
Make observation on the various geophysical processes.
Often data may be (i) Noisy
(ii) Incomplete
(iii) Insufficient
We need to get something out of the data to proceed with processing data.
Inverse theory provides a formalism by which many of the questions
fundamental to geophysics / geophysical data processing may be entertained.
(e.g. optimum sampling rate, how many more data are nodded / desired
accuracy).
Theoretical modelling technique helps to improve the understanding the relation
between observed data (due to earth response to some excitation) & various
subsurface physical property changes or discontinuity that may have generated.
Types of geophysical data:

(1). Data: Mass / moment of inertia of the Earth.
(2). Measurement of travel time and seismic wave (earthquake and natural
explosions).
(3). Gravity / Magnetic anomaly.
(4). Apparent resistivity of the ground.
(4). Well draw- down data.
Data → Field / Lab based.

Physical properties:
(i). Seismic velocity.
(ii). Bulk & shear moduli.
(iii). Ground resistivity.
(iv). Magnetic susceptibility.
Scaled down physical models of the earth which are useful when & where
mathematical models are very complicated & different to work.
Gathering data
Field experiment:
Controlled excitation Unknown earth system Observed response
(e.g. inductive excitation) (e.g. resistive data)
Laboratory experiment:
Systematic input Known physical scaled model Observed response
(e.g. seismic model) (Seismic data)
di sample drawn from a set of equally likely events / values.
Experiment Outcome may vary
Under some condition
Instrument error Human error
We have to find
(i). Mean of the data.
(ii). STD / uncertainty.
(iii). Modelling of lithosphere’s response to loading / strain rate variations in
sedimentary basins.
(iv). Well (pump) test analysis in hydrology.
(v). Factor analysis in geology.
(vi). Geochronology determination & geomagnetic reversal data.
(vii). Satellite navigation.
(viii). Optimal control of engineering system
(ix). Medical tomography.
(x). Decisions making / operational research in management and mineral
economics.
(1). Curve fitting:

FORWARD THEORY = Mathematical expression to represent or reproduce
“data”
p
di = ∑G
j =1
ij mj
FORWARD THEORY = y = F(x)

Where yi = data ≈ ax i2 + bxi + c
Let, x = 1, ............., 10 km ( 1by 10)
You have say y = 5, 20, 7, 15, ........ (1by 10)
Directly determine a, b, c by least square data fitting / minimum of error
between y predicted from forward theory and observed response.
n n
s= ∑ ei2 =
i =1
∑(y
i =1
i − axi2 − bxi + c )
∂s ∂s ∂s
= 0, = 0, =0
∂a ∂b ∂c
(2). Digital filter design / deconvolution of seismogram:
Let two signals a(t) & b(t) they may be related by a filter function f(t)
a (t ) = f (t ) ∗ b(t ) = ∫ f (τ )b(t − τ ) dτ
Given / knowing a(t) & b(t), finding f(t) in signal analysis.

If the time series of length n, filter function of length p, the integral equation
may written as
p
ai = ∆t ∑ f j bi − j +1
j =1
bi = 0, i < 1 or i > n and ∆t = sampling interval

The equation is a linear and the unknown filter coefficients fi can be react in the
form
D = Gm
Where, m=sought filter and d = time series data
b1 0 0 0 0 .... 0 
b b1 
 2 0 0 0 .... . 
b3 b2 b1 0 0 .... . 
 
G = . . . b1 0 .... . 
. . . . . .... . 
 
bn bn−1 bn−2 bn−3 bn−4 .... bk 
 
 
Where k= n – p +1, the above system is over determined.
Description / Characterization of Geophysical process: Mathematical

models:
Most of the geophysical processes can be described mathematically. As
mentioned earlier set of equations that characterise each process or geophysical
system is known as FORWARD THEORY or Model.
The word ‘Model’ has various connotations in geoscientific community. It may
refer to conceptual Model by geologist or physical (Lab scale) and
mathematical models as is common in geophysics. Here mostly we will use
mathematical model but may refer to conceptual model when if required.
A number of geophysical processes can be expressed by integral equation of the
form
z
di = ∫ k ( z ) p( z )dz...........................(1.1)
0
i
Where di = measurable or observable response of system to an ith external input

or excitation (e.g. explosions or electrical current injection into the ground.)
p(z) is the function relate to the physical properties of the earth (e.g. density /
velocity distribution of the earth & expressed as function of depth / laterally
homogeneous earth) / Model parameter & ki are called data kernels. The data
kernels describe the relation between data & earth model function p(z). The
model parameter may be continuous function of radius & position.
Example: Travel time seismic source & receiver along a ray path for a
continuous velocity field v(x,z) is given by
1
t= ∫ v( x, z )dt
Mathematical description & physical system refer to Forward Theory.
Forward theory has been developed to predict the data or observed response that
we would record over a hypothetical Earth- type structure. These data are
therefore variously called Synthetic or predicted data.
Discretization & Linearization:
In many case earth model is continuous function of depth or radius consider, for
example, the mass and moment of inertia of the earth. Both are related to
density within the earth by the formula
R
Mass = 4π ∫ r 2 ρ ( r ) dr.............(1.3a )
0
8π
R
MI = ∫r ρ ( r ) dr............(1.3b)
4
3 0
Where R is the earth’s radius and ρ(r) corresponds to p(z) in eq. (1.1) and is the
density at radial distance r.eqn (1.3a & 1.3b) may be combined to give the
general expression
R
di = ∫ k (r ) ρ (r )dr.............(1.4)
0
i
This equation can be solved by digital computer. The computational formula

d i = ∑ Gij mij .............(1.5)
Here mj set to ρ(r) dr ; Gij set to ki(r)
In this equation we can say that theoretical problem is discretized.
For technical reasons our field or experimental observation are recorded over
finite intervals (e.g., discrete frequencies or fixed bandwidth) instead of all in
the range [0, ∞ ] required to uniquely characterise the earth- system. For
computational simplicity we often express the continuous distribution of earth’s
physical properties p(z) by finite set of parameters.
e.g., layered earth structure with each layer having a specific density and
thickness. This practice is referred to as parameterization. For convenience we
will be considering only discrete models & discrete parameter which are easier
to handle than continuous distributions.
p
Lij
Let, ti = ∑v
j =1
............(1.6)
j
Notice that travel time is not directly proportional to the model parameter v but
to its inverse. The relation is said to be non-linear in v.
However if we define c = 1/v, where c is the slowness of the seismic wave then
p
t = .∑ Lij c j ..........(1.7)
j =1
Which is the form d = Gm. The relationship is said to be linear. Such

transformation is called “Linearising parameterization”.
Two layer earth model:
 ∞

ρ a ( L ) = ρ1 1 + 2 L ∫ K (λ ) J 1(λL)λdλ ............(1.8)
2
 0 
L = AB/2 , J1 first order Basel
K(λ) is the function of ρ1, ρ2 , t and λ is integration variable.
− K12 exp( −2λt )
K (λ ) = ....(1.9)
1 + K12 exp( −2λt )
ρ1 − ρ 2
K12 = ....( 2.0)
ρ1 + ρ 2
We cannot put equation (1.8) in simple form d= Gm as we did in equation (1.2).

The resistivity depth sounding problem is highly nonlinear. The usual method of
dealing with such problem involves using Taylor’s theorem, a procedure termed
Linearization.
Meaning of inverse problem:
Forward problem: T(z) = a + b(z)
Given: Estimated or values of the model parameters (a, b)
Determine: Theoretical response (data)
Model parameter Numerical representation Computed

of system dynamic i.e. response
Forward Theory
Input Operators Output
Classification of inverse problem:

(a). Over determined problem:
The sought model consists of fewer parameters than the number of field data. It
is solved by best fit to the data. [Least – square fitting method]
(b) Underdetermined problem:
The sought model consists of more parameter than the number of field data. In
this case many solutions / model can explain the field data non-unique solution.
In that case it is possible to use the method originally devised for over
determined problem to derive the smooth model.
(c) Even / exact determined problem:
The model parameter sought is exactly equal to the number of observation of
the field data.
Discretization and parameterization:
Geophysical measurements are usually made to determine the subsurface
properties or structure. The properties be uniquely determined if the
measurement span the observational bandwidth [0.∞]. How ever, this is not
possible owing to technical limitations; we typically conduct our field
experiments over a finite observation interval. The outcomes are discrete
numerical values called field data which are in complete and often inconsistent.
For computation simplicity we often tend to seek the minimum set of
parameters that describe our observations or earth structure. The inverse
problem is therefore discretized and our hypothetical Earth- model is
parameterized into a finite number parameters. Geophysical inverse theory is
thus concerned with the approximation of otherwise continuous functions with a
finite number of parameters.
Weighted measure of length as a type of A priori solution:
There are many instances in which L = mT m is not a good measure of solution
of simplicity. Let us consider the inverse problem for finding density
fluctuations in the ocean. One may not look for solution that is smallest in the
sense of closest to zero but one that is smallest in the sense that it is closest to
some other values such as average density of sea water. The obvious
generalized solution,
L = (m-m2) T (m-m2)
Where m2, a priori value of the model parameters / average model parameters.
Sometimes, the whole idea of length is a measure of simplicity is not
appropriate. One may feel that a solution is simple if it is smooth or if it is in
some sense flat. This measure may be particularly appropriate, when the model
parameters represent a ‘discretized continuous function such as density / opacity
properties such as flatness of continuous function can easily be quantified by
norm of its first derivative.
The flatness of a vector m is
1 −1 0 . . . 0  m1 
m 
0 1 −1 0 . . 0 2 

0 0 1 −1 0 . 0  m3 
  
l = . . . . . 0 .   .  = D1m
. 0 . . . . .  . 
  
0 0 . . . −1 1  . 
   mM 
  
Where D1 = flatness / roughness matrix.
So, solution roughness can be quantified by the second derivative,
1 −2 1 0 . . 0
0
 1 −2 1 0 . . 

. . . . . . 0
 
D2 = . . . . . . . 
0 . . . . 1 2
 
0 . . . . . 1
 
 
L = l T l = [Dm] [Dm] = mT DT Dm = mT wm m
T
Where wm = DTD can be interpreted as weighting factor.
The generalised solution,
[
L = m − m2 ]
T
[
wm m − m 2 ]
By suitable choosing the priori model m and weighting matrix wm , we can
quantify a wide variety of measure of simplicity.
Similarly, weighted measure of prediction error can be written as,

E = eT we e
Where we denotes the relative contribution of each individual error to the total
prediction error.
Weighted least- square solution of over – determined problem:

Letting d = Gm
q = eT we e
= ( d − Gm)T we (d − Gm )
= ( d T − mT G T ) we (d − Gm )
(
= d T we − mT G T we × ( d − Gm))
= d T we d − d T weGm − mT G T we d + mT G T weGm
 ∂q 
 T 
= 0 − 0 − G T we d + G T weGm = 0
 ∂m 
G T weGm = G T we d
(
m est = G T weG )−1
G T we d
Weighted minimum length solution of completely under- determined

problem:
(
L = m − m2 )T
(
wm m − m 2 )
(
L = mT − m 2 )w (m − m )
T
m
2
L = (m T
wm − m w )(m − m )
2T
m
2
= mT wm m − mT wm m 2 − m 2 wm m + m 2 wm m 2
T T
Let , mT = m & m 2 = m 2
T
= mT wm mT − mT wm m 2 − m 2 wm mT + m 2 wm m 2
T T
Total error / misfit
q = L + λ[d − G (m)]
 ∂q 
 T 
= 2 wm mT − wm m 2 − m 2 wm +0 − λG = 0
 ∂m 
λ 1
mT = m 2 + G×
2 wm
1  GT λ 
∴, m = m +  2


2 wm 
Now,
d = Gm
 2 GT λ 
 m + 2w 
d = G 
 m 
λGG T λ −1
d = Gm +
2
= Gm 2 + .Gwm GT
2 wm 2
λ = 2(Gwm−1G T )(d − Gm 2 )
1  GT λ 
m est = m 2 + . 

2 
 wm 
m est = m 2 +
1 −1 T
2
wm G .2 Gwm −1
(
GT
−1
)
( d − Gm 2 )
m est = m + wmG T (GwmG T ) −1 ( d − Gm 2 )

2
Weighted damped least – square:
If the equation
Gm = d is slightly under-determined it can be solved by minimizing a
combination of prediction error and solution length,
E + β2 L.
The solution is then
[
m est = m 2 + G T weG + β 2 wm ]−1
[
G T we d − Gm 2 ]
Which is equivalent to
[
−1
m est = m 2 + G wm G T + β 2 we−1 ] [d − Gm ]
−1 2
In both instance, one must take care to ascertain whether the inverse actually
exist. Depending on the choice of the weighting matrices, sufficient a priori
information may be or may not have been added to damp the indeterminacy.
Inverse Problem
Given: Field observation (Earth system response), T(z) data
T(z) = a + bz
Determine: Parameters of the earth- model (a, b)

The inverse process
Observe data Mathematical tool / Inverse theory Model parameter
1/p Operator output
Generalised least square solution:
The linear problem is posed in a matrix form d = Gm. We now want to solve for
m.
For perfect data: there are no experimental errors.
m = G-1 d
However gauss (1809) suggested that due to experimental errors, practical data
di would not fit the model exactly, i.e., d = Gm+ e
The best way to get unique solution for the model parameters is to minimize the
sum of squares of the residual, i.e.,

2
n p

q = e e = ∑
T
 i ∑ ij j  , j = 1, p.............(1)
d − G m 
j =1  j 
We can re- write the equation as

q = (d − Gm ) (d − Gm )
T
Expansion
[
= d T d − d T Gm − mT G T d + mT G T Gm ]
 ∂q 
 =0
 ∂m 
 j 
or ,− d T G − G T d + G T Gm + mT G T G = 0
Giving
2G T Gm = 2G T d
The generalised least square solution
(Unconstraint solution)
m2 = GT G[ ]−1
GT d
The damped – least square solution
[
m2 = GT G + β 2 I ] −1
GT d
Β is the Lagrange multiplier.
Alternative (method):
q = (d − Gm ) ( d − Gm)
T
= d T d − d T Gm − mT G T d + mT G T Gm
∂q
= 0 − 0 − G T d + G T Gm = 0
∂m T
G T Gm = G T d
m 2 = (G T G ) −1 G T d
Orthogonal decomposition of real symmetric matrix:
If G is real symmetric matrix can be factored into

G = UQU T
With orthonormal Eigen vector in U and real Eigen values in Q

But for dimensional / reason / non- symmetric / rectangular matrix this eigen
vector decomposition is not useful. In that case singular value decomposition is
the alternative.
Singular value decomposition (SVD):
Using SVD we can factorized an n by n or n by m Jacobian matrix G, such that

G = UQLT
Where n=data & m = model parameter.

U ( n × m) & L ( m × m)
are two orthogonal matrixes, containing respectively the data space / parameter
space Eigen vectors.

Q ( m × m)
Q is diagonal matrix containing at most real non-zero Eigen values of G with a
condition r ≤ m. These diagonal entities in matrix Q (α1, α2, .........αp) are called
singular value G.
Columns of U are Eigen vector of G GT , columns of L are Eigen vector of GT
G and Q is rectangular matrix with singular value in its main diagonal.
Application:
(
∆m = G T G + β 2 I )−1
G T ∆d
∆m is the parameter correction vector and ∆d is the data correction vector.

G= Jacobian matrix containing partial derivative of data with respect to the
initial model parameter.
( ) LQU
∆m = LQ 2 L + β 2 I
−1 T
∆d
Now, (LQ L + β I )
2 T 2
= (Ldiag {α }L + β I )
2
ij
T 2
= Ldiag (α + β )L 2
j
2 T
Now,
 
( ) 
Ldiag α 2j + β 2 LT = Ldiag  2
α
1
+ β
 T
2 
L

 j 


 1 
 T
∆m = Ldiag  2 2 
L .LQU T ∆d
α j + β 
 

 α 

∆m = Ldiag  2 j 2 U T ∆d
α j + β 
 
Damped least square solution
Damped least square solution of VES data (1-D) data:
Forward modelling:
Following Koeford 1970, earth layer consisting of homogeneous / isotropic
layers
∞
ρa ( s) = s ∫ T (λ ) J (λs )λdλ .........(1)
2
1
0
S = AB/2, J1 is the first order Bessel function and λ is the integration variable.
T(λ) Resistivity transfer function.
Ti +1 (λ ) + ρi tanh(λhi )
Ti (λ ) = .......( 2)
Ti +1 (λ ) tanh(λhi )
1+
ρi
n= no. of layer
ρi & hi true resistivity and thickness of ith layer.

Ti +1 (λ ) + ρi tanh(λhi )
Ti (λ ) = .......( 2)
Ti +1 (λ ) tanh(λhi )
1+
ρi
Inversion:
(
∆m = G T G + β 2 I )−1
G T ∆d
Using SVD:
 α
 

∆m = Ldiag  2 j 2 U T ∆d
α j + β 
 
Initially damping factor is said to be large positive value while making the full
use of steepest descent method. Subsequently at each iteration the damping
factor is multiplied by a factor less than unity so that least square method
dominates near the solution.
Arinason & Hersir (1988)

ρ = α w ∆c
1
w
cr −1 − cr
∆cr =
cr −1
w= Test number.
Α = Parameter eigen value.
cr = Misfit value at current iteration.
cr-1 = Misfit value at previous iteration.
Year 1760 – 1810.
Boscovich & Laplace: Minimizing the sum of the values of the misfit function.
Least – absolute – values method.
Legendre & Gauss: Minimizing the sum of the squared values of misfits.
Least square method.
Two methods follow two different hypothesis statistical distribution of error in
data follows.
Laplacian distribution
f ( x ) ≈ exp(− x )
Gaussian distribution
(
f ( x ) ≈ exp − x 2 )
Least- absolute- values method follows Linear programming (LPP)
Least- square method follows Linear algebra.
Over whelming popularity of least square method is due to the use of linear
algebra which is simpler than LPP.
It is widely recognized that the least absolute criteria is less sensitive than the
least-square to the presence of large uncontrolled errors in the data. (Property
called robustness).
Inverse problem may be posed optimization problem
min ∆x − y
(Similarly ||Gm – d||)
It can be shown that for the lp family of norms, if this optimization problem has
a solution, then it is unique, provided the matrix has full column rank and p > 1.
For p = 1, the norm loses in the technical jargon, strict convexity.
Let us consider one- parameter linear system,

1  1 
λ  x = 0 
   
Let λ ≥ 0. If we solve the problem on the open interval x ≤ 0,1
lp error function
[
E p ( x) = x − 1
p
+ λp x
p
]
1
p
Remember λ is just parameter.
This has unique solution for any λ, p > 1
p = 1(multi-valued function / not a single valued) Uniqueness theorem is
only valid p >1.
Describing & formulating inverse problem:
Key questions:
1. what is applicable parameterization?
(a) discrete (b) continuous ?

2. What is the number of the geophysical data?
≈ what are the errors in the observation?
3. Can we pose the problem mathematically? Ill posed / well posed??
4. Are there any physical constraints on the problem?
5. What types of solution to the problem are describe; to what accuracy?
Are we looking for exact / approximate solution?
6. Is the problem is non- linear / linear?
7. What are the confidence limits of the solution? Can it be appraised by other
means?
Types of solution to the inverse problem:
What do we ask of a given data set?
Depending on the problem in hand, we may have variety of solution types.
If we are analysing geophysical time series, we might have interested in finding
(a) Optimum sampling rate.
(b) Suppressing noise / unwanted signal.
(c) Remember the trend from the time series.
If we deal with finding physical properties distribution over subsurface such as
(Layer resistivity, layer velocity , layer density, thickness etc.)
However, owing to the fact that some of the solutions are inherently non-
unique. e.g., Conductivity thickness (σt).

Geophysical technique can resolve (σt) productivity uniquely rather than the
individual parameter. We may prefer slowness (1/v) to the velocity (v) of
acoustic waves, due to the advantage of such parameterization. In some
situation we may be of interest to seek the non- uniqueness bounds defined by
our data or a suite of extreme models that define a particular aspect of the model
or even the model space rather a single model for the subsurface.
Steepest-decent (Gradient method):
A linear inverse problem can be posed as optimization problem where the cost
function is surface of the cost function is quadratic. The cost function is a
parabolic with a single minimum.

1 T
f ( m) = m Gm − d T m
2
Will have single minima with respect to the model parameter if the second order
partial determine matrix G is positive definite.
Though the negative gradient provides the direction of the maximum decrease
in the cost function if does not provide the step size. One way to keep the step
size constant. However larger step size may miss the minimum point, so the
optimization becomes oscillatory. Step size may be calculated at every iteration
by computing the first order derivative with respect to the step size and equating
to zero.
Initial model (m0)
Compute gradient
Convergence Yes
Search direction = -gradient achieved? Exit
Optimum step size (α)
Update model
mk+1 = mk+ αs
Flowchart
Code:
1. Given m0
2. Set k ← 0
3. While (convergence criteria not satisfied) d0

4.S k = −∇f ( m k )
5. If Sk = 0 then
Stop
6. End if
7.α k ← min f ( m k + αs k )
8.mk +1 ← m k + α k s k
9.k ← k + 1
10. end while.
q = ( d − Gm)T ( d − Gm) ≈ d − f ( m)
2
∂q  ∂f ( m) 
∆m = − k = − k − 2(d − f ( m) ). 
∂m  ∂m 
∂q ∂f ( m)
= −2(d − f ( m) ). ≡ −2G T (d − f ( m) )
∂m ∂m
{ }
∆m = − k . − 2G (d − f ( m) ) = [2k ].G T (d − f ( m) )
T
m k +1 = m m + ∆m
Where k= constant.
Conjugate gradient (CG): (Fletcher – reeves Method):

1 T
f ( m) = m Gm − d T m..........(1)
2
is a quadratic equation of m.
Where d & G are constant, G is positive definite.
Local gradient of function

∇f ( m) = Gm − d .........( 2)
Let m0 is initial model.

The direction of steepest –descent
S0 = −∇f 0
= Gm0 + d ............(3)
Minimization of equation (1) at m1

Gm1 − d = 0..................( 4)
Suppose we can find a set of w vectors (where w is the dimensionality of the
parameter space) which are mutually conjugate with respect to G so that

S Tj GSi = 0...............(5)
j≠i
Then it is easily shown that these vectors will be linearly independent if G is
positive definite. Such vectors therefore form a complete, but non- orthogonal,
basis set in parameter space. Say, we are starting from some point m0, we wish
to get to the minimum m1 of the function. The difference between vectors m0 &
m1 can be written
w
m1 − m0 = ∑ α S ............(6)
j =1
i i
j −1
m j = m0 + ∑ α i Si ............(7)
i =1
In the iterative form we can write

m j +1 = m j + α j S j ............(8)
This represents a succession of steps parallel to the conjugate directions, with
step length controlled by the parameters αj.
If we multiply SjT G with equation (6) & using eqn (4)

w
S Tj Gm1 − S Tj Gm0 = ∑α S
i =1
i
T
j GSi ...........(9)
w
S Tj (d − Gm0 ) = ∑α S i
T
j GSi
i =1
S ( d − Gm0 )
T
αi = j
..................(10)
S Tj GSi
In simple form, we multiply SjT G to equation (7)

j −1
S Tj Gm j = S Tj Gm0 + ∑ α j S Tj GSi
i =1
S Gm j = S Gm0 ...........(11)
T
j
T
j
Let SjT G Si = 0 ; i ≠ j
S Tj d − S Tj Gm0 S Tj d − S Tj Gm j S Tj (d − Gm j ) S Tj ∇f
αj = = = =−
S Tj GSi S Tj GSi S Tj GSi S Tj GSi
................(12)
For general kth iteration we can write
S kT ∇f k
α k= − T ...........(13)
S k GS k
S0T ∇f 0
α0 = − .............(14)
S0T GS0
Sine residual vector at the kth iteration is

rk = Gmk − d = ∇f k .........(15)
S kT rk
so, α k = − T ............(16)
S k GS k
The current search direction S1 is given as a linear combination of the previous
search direction and the current gradient vector.

The second search direction,
S1 = −∇f ( m1 ) + β1S0 ..........(17)
Where β1 = scalar is chosen such that the search direction S1 & previous search
direction S0 are conjugate implying S0T G S1 = 0
Multiply S0T G in both sides
S0T GS1 = S0T G{−∇f ( m1 ) + β1S0 } = 0
Since,
m1 = m0 + α 0 S 0
m1 − m0
S0 =
α0
−1
 m1 − m0 

 
 GS1 = S0T G (− ∇f ( m1 ) + β1S0 ) = 0
 α0 
∇f ( m1 ) − ∇f ( m0 ) = G ( m1 − m0 )
We can write,
{∇f ( m1 ) − ∇f ( m0 )}{∇f ( m1 ) − β1S0 } = 0
∇f ( m1 )T ∇f ( m1 ) − ∇f ( m1 )T β1S0 − ∇f ( m0 )T ∇f ( m1 ) + ∇f ( m0 )T β1S0
=0
β1 × {∇f ( m0 )T S0 − ∇f ( m1 )T S0 } = ∇f ( m0 )T ∇f ( m1 ) − ∇f ( m1 )T ∇f ( m1 )
∇f ( m1 )T ∇f ( m1 ) − ∇f ( m0 )T ∇f ( m1 )
β1 =
∇f ( m1 )T S0 − ∇f ( m0 )T S0
∇f ( m1 )T ∇f ( m1 )
β1 = −
∇f ( m0 )T S0
∇f ( m1 )T .∇f ( m1 )
β1 =
∇f ( m0 )T .∇f ( m0 )
Since, S0 = −∇f ( m0 )
And taking conjugate condition equation (3).
Similarly, we express third direction as the linear combination of the current
gradient & all past search direction.
S 2 = −∇f ( m2 ) + β 2 S1 + γ 2 S0
S2 is the current search direction at the updated model m2 , β2 & γ2 are two
scalars that ensure conjugacy among the current and past search directions.
The condition of conjugacy between S0 and S2 requires that γ2 be zero. From the
condition of conjugacy between S1 & S2 , we obtain

∇f ( m2 )T .∇f ( m2 )
β1 =
∇f ( m1 )T .∇f ( m1 )
Hence the current search direction is given by,

∇f ( m2 )T .∇f ( m2 )
S 2 = −∇f ( m2 ) + × S1
∇f ( m1 )T .∇f ( m1 )
The generalised expression for the kth iteration is written as,

S k = −∇f ( mk ) + β k S k −1
∇f ( mk )T .∇f ( mk )
where, β k =
∇f ( mk )T .∇f ( mk )
The residual vector rk at the kth iteration is given by,

rk = ∇f ( mk )
rkT rk
βk =
rkT−1rk
Code:
1. Given m0
2.Set , k ← 0; r0 ← Gm0 ; S0 ← − r0
3. While (convergence criteria is not satisfied) d0

S kk rk
4.α k ← − T
S k GS k
5.mk +1 ← mk + α k S k
6.rk +1 ← Gmk +1 − d
rkT+1rk +1
7.β k +1 ←
rkT rk
8.S k +1 ← − rk +1 + β k +1S k
9.k ← k + 1
10. end while
Non-linear Conjugate Gradient:
If the function to be minimized quadratic example

1 T
f ( m) = m Gm − d T m
2
The step length αk along the direction Sk for which the function f (mk + αk Sk) is
minimum can be analytically calculated by minimizing the function f (mk + αk
Sk) with respect to αk .
However for non-linear function in general there does not exist analytic
expression to determine the optimum step length αk . The non-linear function is
minimized along the direction of Sk and the residual

rk ( = ∇f )
is replaced by the gradient of the non-linear function.
Code:
1. Given m0
2.Compute, f 0 ← f ( m0 ); ∇f 0 ← ∇f ( m0 )
3.Set , k ← 0; S0 ← −∇f 0
4. While (convergence criteria is not satisfied) d0
5. Calculate αk by line search

6.mk +1 ← mk + α k S k
7.Calculate, ∇f ( k + 1)
8. If (Fletcher- Reeves), then

∇Tf ( k + 1)∇f ( k + 1)
9.β k +1 ←
∇f kT ∇f k
10. End if.

11. If (Polak- Ribiere) then
∇f T
( k + 1)(∇f k +1 − ∇f k )
12.β k +1 ←
∇f k
2
13. end if
14.S k +1 ← −∇f k +1 + β k +1S k
15.k ← k + 1
16. end while
Numerical studies shows that Polak – Ribiere method is generally more robust
then the Fletcher- Reeves method (Nocedal & wright, 1999).
Non-linear CG is the generalization of CG to optimize non-linear function.
While CG aims at finding the solution of the non-linear equation GT G m =GT d.
The success of non-linear CG lies in the fact that paraboloid approximation of
the function (cost) in the vicinity of the initial model encompasses the global
minimum of the cost function. When this condition fails, non-linear CG may not
Necessarily converge to the global minimum of the cost function.

Non-Linear Inverse problems:
Non- linear inverse problem belong to a class of inverse theory where there
exists non- linearity in the model data relationship. Non- linear model data
relationship leads to non-quadratic cost function as opposed to the linear model-
data relationship where the cost function is quadratic. The cost function
topology is likely to be multimodal in case of non- quadratic cost function.
Optimization of such cost function is complex because of the presence of
several minima in the cost function surface.
Example:
Certain inverse problems are inerasable via approximations of the forward
operator. For example estimation of earth elastic parameters from AVO data
using a forward operator that involves. Computation of reflection coefficients
from Zeoppritz equation is a non-linear problem. However, for a reasonable
angle of reflection, The Zeoppritz equation can be approximated to obtain the
linear form. Aki-Richards equation (Aki and Richardss, 1980) is one such linear
representations of the Zeoppritz equation.
Structure of an inverse problem:
The inverse solution of an equation
d = Gm is
m = G-1 d
This leads to a following question of interest.
(a). Existence:
Given observed data for the system, is there some value for the unknown
parameters that actually yields the observed data? If not the inversion problem
has no solution.
(b). Uniqueness:
Can the unknown parameters in principle be uniquely determined from the
measured data? Or could two different sets of values for the unknown
parameters give rise to the same observation?
A solution is said to be unique if changing a model from m1 to m2, the data will
also change from d1 to d2 such that d1 ≠ d2.
(c). Stability:
If the measured data contains small errors, will the error in the resulting
estimates of the unknowns be correspondingly small? Or could small
measurement error lead to huge errors in our estimates?
(d) Robustness:
Robustness indicates the level of insensitivity in presence of large uncontrolled
error i.e. (outlier in the data). An inverse problem for which existence,
uniqueness, and stability hold is said to be well-posed. The alternative is an ill-
posed inverse problem. Ill-posed problem tend to be the most common.

Steepest- Descent Algorithm:
A linear / literalized problem can be posed as an optimization problem where
the cost function is quadratic. This means that the surface of the cost function is
a paraboloid with a single minimum.
A quadratic function
1 T
f ( m) = m Gm − d T m
2
Will have a single minimum with respect to the model parameters if the second
order partial derivative matrix G is positive definite.
To the find the minimum of the cost function then the algorithm proceeds along
a negative gradient direction calculated at each iteration.
Though the negative gradient provides the direction of maximum decrease in
the cost function, it does not provide the step size. One way to obtain the step
size. One way to obtain the step size is to keep it constant. However, if the step
size is to keep is large, there is a possibility that the algorithm may miss the
minimum point and become oscillatory. In order to avoid, step size is
calculated.
Newton Optimization method:

The idea of well known Newton’s method of root finding for an invariate
function. Let f (m) is a multivariate function whose Taylor series expansion at
m = mi is given by,
1
f ( m) = f ( m) + ∇f ( mi )( m − mi ) + ( m − mi )T H i ( m − mi )
2
where, H i = ∇ 2 f ( m) mi
Is the Hessian matrix evaluated at m = mi .The first derivative is zero at the
point where the function is minimum. So equating

 ∂f ( m) 
 =0
 ∂m 
We obtain
∇f ( m) = ∇f ( mi ) + H i ( m − mi ) = 0
Thus mi+1 for the updated model is given by,

mi +1 = mi − H i−1∇f ( mi )
This is analogous to the expression for the Newton’s root finding method for an
unvariate function which is given by

f / ( xi )
xi +1 = xi −
f // ( xi )
Convergence is achieved by the Newton’s method provided that the Hessian
matrix is non- singular. For quadratic cost function Newton’s method will find
the minimum in one step.
Marquardt Optimization method:

Marquardt’s method for optimization (Marquardt 1963) uses the benefits of
both Newton’s technique to obtain a faster convergence. Marquardt’s method
Modifies the diagonal terms of the Hessian matrix is given by,

~
H = H i + λi I
Where I is an identity matrix and λi is a scalar to modify the diagonal elements
of the Hessian matrix.
It is evident from the equation that for a very large λi the term λi I dominates the
Hessian matrix Hi. In such a case

~ 1
H i−1 = ( H i + λi I ) −1 ≅ I
λi
So in that condition model is updated in a gradient descent approach. It is
obvious that when λi
Is reduces to a small number, the model is updated with a Newton’s model
approach.
The Marquardt algorithm provides a faster convergence when λi is a set to a
large parameter during the initial iterations and then gradually reduced to a
small number during the later iterations as the updated model approaches the
optimum point.
Data Resolution matrix:
Linear inverse problem takes the form Gm = d. Using the generalized theory,
we get an estimate of model parameters,

m est = G − g d obs
[For the sake of simplicity we assume that there is no additive vector / noise.]
d pred = Gm est
[
d pred = G G − g d obs ]
[
d pred = GG − g d obs = Nd obs ]
Here the subscript ‘pred’ and ‘obs’ means predicted & observe respectively.
The (N by N) square matrix N = G G-g is called the data resolution matrix. Data
resolution matrix characteristics whether data can be independently predicted /
resolved.
Model Resolution Matrix:
We imagine that there is true but unknown set of model parameters mtrue that
solve G mtrue = dobs
[
m est = G − g d obs ]
m est = G − g Gmtrue [ ]
[ ]
m est = G − g G mtrue = Rmtrue
Where R is (M by M) model resolution matrix. If R = I the each model
parameter is uniquely determined.
Determination of the damping factor in ridge regression / Marquardt

Lavenberg:
For automated inversion, the common practice is to set β first to large positive
value thus taking advantage of the good initial convergence properties of the
steepest descent method thereafter β is multiplied by a factor less than unity
after each iteration, so that the linear least- squares method predominates near
the solution.
A variant of the procedure (Johansen, 1977) assumes as β the smallest Eigen
value of GTG matrix. If divergence occurs, it is replaced by the next largest
Eigen value until the solution is obtained.
A more sophisticated method of damping has been developed by Meju (1988,
1992) which is in effect a hybrid of the two methods highlighted.
(1). Damping factor is determined empirically which is linked to approximate
derivatives of a Lagrangian function (Herskovits, 1986) and is used in a
minimization of sub- problem at each iteration.
Find largest & smallest values of GT G.
Operationally, the largest & smallest Eigen values of the problems are
multiplied by 10 & 0.1 respectively. Giving λl & λs that are used to determine
the coefficients of a parabola from which ten samples the auxiliary factors λk are
obtained using the formula (Meju, 1992)
λk =  {100λs − λl } + {λl − λs }k 99 
 2

 
where, k = 1.......10
hence,
β k = λ2k
βk is required in a line search procedure.
Algorithm by Meju (1992): / Ridge- regression:
Step:
1. Select a starting model m0

2. Calculate
n
Misfit , q = 0
i ∑ (w
i =1
y )i2
where, w y = wd − wf ( m)
3. If the misfit function is satisfied, stop the program.

4. Obtain the weighted partial derivative WG = G*
5. Calculate SVD of G*
6. Obtain most feasible solution for the line search
a. Calculate damping factors for the
λk =  {100λs − λl } + {λl − λs }k 99 
 2

 
where, k = 1.......10
Hence,
β k = λ2k
Where λs & λl are the smallest and largest singular values of G* , multiplied
respectively by 0.1 & 10.
b. Set β0 = 0
c. perform line search with ridge regression.
Loop (j =1 to 11 & k = 11-j)
Get,
Qi
Qi−1 =
(Qi + β k ) 2
i = 1, p
calculate, m j = nm0 + LQd−1U T y*
n
∑
2
compute, q1j = wd − wf ( m j )
i =1
(
If , q1j > q1j −1 )
Set optimal solution to mj – 1quit else.
Set optimal solution to mj .
end loop
7. Set the optimal model from step 6. As the new iterate (i.e. m0)
8. Go to step 2.
GOODNESS OF FIT:
Assuming that our data di are normally distributed about their expected values
and with known uncertainties 𝜎𝜎i (experimental error)
2
 d iobs − Gij mij
n 
q = fit = ∑   , j = 1, p

i =1  σ2 

n 2
or , q = ∑ Wd
i −1
obs
− WGm
For n – independent observations
p- Independent parameters
q is distributed according to ᵪ2 ( chi –square fit) with (n-p) degrees of

freedom.
In geophysical inversion, we rejected or accept the solution to the problem

being considered based on the value of q.The expected value of q is say n,
The model is acceptable, (from chi square statistics) with
n− p≤n+ 2n
If q ≤ n ,–model is said to be over fit.
q ≥ n, – model is said to be under fit.
If experimental error are not available an unbiased estimate of 𝜎𝜎 2 is
∆2 =
(d T
d − mT G T Gm )
n− p
≡ 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟

𝑛𝑛−𝑝𝑝
1 n (d obs
− Gij mij )
2
RMS =
n
∑i =1
i
σ2
n 2
1
RMS =
n
∑ Wd
i =1
obs
− WGm
for weighted solution
q = ( d − Gm)T ( d − Gm) + β 2 ( Dm − h)T ( Dm − h)

q = ( d T − mT G T )(d − Gm) + β 2 ( mT DT − hT )( Dm − h)
q = ( d T d T − d T Gm − mT G T d + mT G T Gm) + β 2 ( mT DT Dm − mT DT h
− hT Dm + hT h)
 ∂q 
 T 
= 0 − 0 − G T d + G T Gm + β 2 DT Dm − β 2 DT h − 0 + 0
 ∂m 
 ∂q 
Set ,  T 
=0
 ∂m 
G T Gm + β 2 DT Dm = G T d + β 2 DT h
(G T G + β 2 DT D ) m = G T d + β 2 DT h
(𝐺𝐺 𝑇𝑇 𝐺𝐺 + 𝛽𝛽2 𝐷𝐷𝑇𝑇 𝐷𝐷) – Normal equations
Or if D is identity matrix
(G T G + β 2 I ) m = (G T d + β 2 h)
(𝐺𝐺 𝑇𝑇 𝑑𝑑 + 𝛽𝛽2 ℎ ) - Normal equations

m est = (G T G + β 2 I ) −1 (G T d + β 2 h)
This is constrained linear inversion formula. It is also known as biased linear

estimation technique.
INVERSION WITH PRIOR INFORMATION
These previous information obtained the sought model parameter can be

incorporated in our present formulation. The previous information may be the
information from the previous experiments /quantified expectations dictated by
physics of the problem .Generally these external data help to single out a unique
solution from among all equivalent ones .The constraining equations (data) are
arranged to form an expression of form
Dm = h
Where D is a matrix (with all the off-diagonal element equal to zero) that
operates on the model parameters m to yield or preserve the prior values of m
that are contained in the vector h.
Dm=h means that we are employing linear equality constraints that are to be
satisfied exactly .The mathematical development is straight forward .we wish to
bias 𝑚𝑚𝑗𝑗 towards ℎ𝑗𝑗 .
APPLICATION OF CONTRAINED INVERSION
Fitting a straight line problem
𝑑𝑑𝑖𝑖 = 𝑚𝑚1 + 𝑚𝑚2 𝑥𝑥𝑖𝑖
In the form d = Gm, where m =(𝑚𝑚1 , 𝑚𝑚2 )𝑇𝑇
Data pairs ( {𝑥𝑥𝑖𝑖 , 𝑡𝑡𝑖𝑖 }, i=1, 2......n)

The a prior information is the 𝑡𝑡𝑖𝑖 fitted line should pass through ( 𝑥𝑥𝑐𝑐 , 𝑡𝑡𝑖𝑖 ) . So we
have one constraint (you may impose a number of constraints on the problem).
Dm = h takes the form,
m 
[1 xc ] 1  = [tc ]
m2 
m 
[1 xc ] = D,  1  = m, [tc ] = n
m2 
Let β= 1.0
 n

∑x i 1

(G G + β I ) = ∑ xi
T 2
∑x 2
i xc  ← 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
 1
 xc 0 
∑ ti 
 
∑ xi ti 
(G T d + β 2 H ) = . 
  ← 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
. 
 
t
c 

FORMULATING CONSTRAINING EQUATIONS
The equations Dm = h in the general form,

n 
m1   1 
1  m  n2 
 1   2  = . 
  .   

 1
 m p  . 

 
 n p 
 
One parameter is known. Then we can modify it as,
 m1 
m 
 2
[1 0 0.... 0... 0] .  = [hknown ]
 
 . 
m p 
 
← 𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗 𝑎𝑎 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣.
If two parameters (say, first & fourth) values are known,
1   m1   h1 
 0  m   0 
  2  =  
 0   m3   0 
    
 1 m4  h4 
Operationally, we need to add the row [1 0 … 0] on to add the bottom of

the G matrix and the known values of the parameters [ℎ𝑘𝑘ℎ𝑜𝑜𝑜𝑜𝑜𝑜 ] onto the bottom
of the actual field data d.
Where desired, both d and h are multiplied by β (usually chosen to be less than
or equal to unity.)
The constrained least square solution for the straight line through (𝑥𝑥𝑐𝑐 , 𝑡𝑡𝑐𝑐 ) is
therefore,
−1
 m1   n ∑x i 1

 ∑ ti 
 
mc =  =
 ∑ xi ∑x ∑
2
 m2 i xc   xi t i 
β 
   0  tc 
 1 xc   
Let, 𝑥𝑥𝑐𝑐 = 8
𝑦𝑦𝑐𝑐 =14.9
Example: 1. Constrained seismic refraction time –term analysis.
 δ1 
 δ2 
 
 γ1 
[1 0 0 0 0]λ γ2
 = [δ1 ] = 0.433
 
 γ3 
 1 

 v1 

1 0 1 0 0 
6
1
 0 0 1 0 6.708
1 0 0 0 1 8.485 
 
G = 0 1 1 0 0 7.616
0 1 0 1 0 7.0 
 
0 1 0 0 1 7.616
1
 0 0 0 0 0   βD
 2.323
 2.543
 
2.857 
 
d =  2.640
 2.529
 
 2 . 553 
 0.433 βh
 
INVERSION WITH SMOOTHNESS MEASURES
Suppose we do not have any known estimates of delay time .The simplest &
cheapest remedy for such problems or a prescription for indeterminably or non
– uniqueness in inversion,
SMOOTH ← 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑓𝑓𝑓𝑓𝑓𝑓 𝑛𝑛𝑛𝑛𝑛𝑛 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢.
Problem Formulation:
if it is desired that the model parameters vary slowly with positions say, we
may choose to minimize the difference between physically adjacent parameters
(𝑚𝑚1 − 𝑚𝑚2 ),(𝑚𝑚2 − 𝑚𝑚3 )… (𝑚𝑚𝑝𝑝−1 − 𝑚𝑚𝑝𝑝 )
These first differences can be written as constraining equations in the form Dm

=h
1 −1 . . . .   m1  0
 1 −1 m   . 
  2  =  
 . . .  .   . 
    
 . 1 − 1   0 
m p 
D m h
D =Flatness/ smoothness matrix of solution vector m
If the model parameters do not vary smoothly with position, then the use of
constraining equations of the form
1 ⋯ ⋯⋯ 𝑚𝑚1 0
�⋯ 1 ⋯ ⋯ � �𝑚𝑚2 � = �0�
1 ⋯ ⋮ ⋮
⋯ …
⋯ 1 𝑚𝑚𝑝𝑝 1
D m h
is recommended .Then D→identing matrix. Then it is biased estimation with

non – information .The operation effectively damps the length of the solution
(by forcing it into conforming to h) leading to the stable inverse process.
The quadratic 𝑞𝑞2 (𝑚𝑚) is given by,
q2 ( m) = ( Dm − h)T ( dm − h) = ( mT DT − hT )( Dm − h)
q2 ( m) = mT DT Dm − mT DT h − hT Dm + hT h
≅ mT DT Dm
≅ mT Hm
where, H = DT D
q = ( d − Gm)T ( d − Gm) + β 2 ( mT DT Dm)
∴, q = q1 + β 2 q2
We minimize q .
Example :
1 0 1 0 0 6 
1
 0 0 1 0 6.708 

1 0 0 0 1 8.485 
 
0 1 1 0 0 7.616 
0 1 0 1 0 7.0 
G= 
0 1 0 0 1 7.616 
0 1 0 0 0 0 
 
0 0 0.01 − 0.01 0 0 
βD
0 0 0 0 − 0.01 0 
 

0 0 0 0 0.01 − 0.01

 2.322 
 2.543
 
2.857 
 
 2.640 
 2.529 
d = 
 2 . 553 
 0.0 
 
 0 . 0 
 0.0  βh
 
 0.0 
 
Input data structure with first difference operator for β= 0.01.

Gi Part1 PDF

Uploaded by

Copyright:

Available Formats

Gi Part1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gi Part1 PDF

Uploaded by

Copyright:

Available Formats

Geophysical Inversion

It is a mathematical structure formed by a collection of elements called ‘vector’.

Where, F, x, y are the vectors.

(s1, s2, s3, ..................., s1000)

si is the ith sample. Consider that it is a 3- component physical vector. While

s+t = (s1+t1, s2+t2, ............................., s1000+t1000)

Note: Mathematical definition of vector space is sufficiently general to

In Cartesian coordinates, the dot product is defined by

(x1 , x2 , x3) . (y1, y2, y3) = x1y1 + x2y2 + x3y3

The dot product satisfy the following properties

(1). It is symmetric in x & y (i.e. x . y = y . x)

In order to define the relative ‘size’ of vectors matrices a generalized concept of

(1).‖ v‖ > 0 for any v≠ 0 and | v |=0 ↔ v=0

(3). ‖v +u ‖ ≤ ‖v‖ + ‖u‖ [Triangle property]

For p= 2 this is just the ordinary Euclidean norm: x 2 = x T x

lp norm exists as p→ ∞ called l∞ norm

Prove that (AB)T = BT AT , AB ≠ BA for non-square matrices

Eigen values and Eigenvectors:

This leads to solve two homogeneous systems

From which we arrive at two Eigen vectors,

System of equations → described the relationship is FORWARD THEORY

Inverse theory is an organised set of mathematical statistical techniques (e.g.

Geophysical explanations philosophy & inverse theory:

Types of geophysical data:

Data → Field / Lab based.

(1). Curve fitting:

FORWARD THEORY = y = F(x)

Given / knowing a(t) & b(t), finding f(t) in signal analysis.

bi = 0, i < 1 or i > n and ∆t = sampling interval

Where k= n – p +1, the above system is over determined.

Description / Characterization of Geophysical process: Mathematical

Where di = measurable or observable response of system to an ith external input

This equation can be solved by digital computer. The computational formula

Which is the form d = Gm. The relationship is said to be linear. Such

We cannot put equation (1.8) in simple form d= Gm as we did in equation (1.2).

Model parameter Numerical representation Computed

Classification of inverse problem:

By suitable choosing the priori model m and weighting matrix wm , we can

quantify a wide variety of measure of simplicity.

Similarly, weighted measure of prediction error can be written as,

Weighted least- square solution of over – determined problem:

Weighted minimum length solution of completely under- determined

Total error / misfit

m est = m + wmG T (GwmG T ) −1 ( d − Gm 2 )

Weighted damped least – square:

Gm = d is slightly under-determined it can be solved by minimizing a

combination of prediction error and solution length,

The solution is then

exist. Depending on the choice of the weighting matrices, sufficient a priori

Given: Field observation (Earth system response), T(z) data

Determine: Parameters of the earth- model (a, b)

Observe data Mathematical tool / Inverse theory Model parameter

1/p Operator output

Generalised least square solution:

For perfect data: there are no experimental errors.

di would not fit the model exactly, i.e., d = Gm+ e

sum of squares of the residual, i.e.,

We can re- write the equation as

The generalised least square solution