Objective Functions For Full Waveform Inversion: William Symes

Objective functions for full waveform inversion
William Symes
The Rice Inversion Project
EAGE 2012
Workshop: From Kinematic to Waveform Inversion
A Tribute to Patrick Lailly
Agenda
Overview
Challenges for FWI
Extended modeling
Summary
Full Waveform Inversion
M = model space, D = data space
F : M → D forward model
Least squares inversion (“FWI”): given d ∈ D, find m ∈ M to

minimize
JLS [m] = kF [m] − dk2 [+ regularizing terms]
(k · k2 = mean square)
+ accommodates any modeling physics, data geometry, spatial

variation on all scales (Bamberger, Chavent & Lailly 79,...)
+ close relation to prestack migration via local optimization
(Lailly 83, Tarantola 84)
+ gains in hard/software, algorithm efficiency ⇒ feasible data
processing method
++ some spectacular successes with 3D field data (keep listening!)
± with regularizations pioneered by Pratt and others, applicable
surface data if sufficient (i) low frequency s/n and (ii) long
offsets
- reflection data still a challenge
Why are
I low frequencies important?

I long offsets (diving waves, transmission) easier than short
offsets (reflections)?
What alternatives to Standard FWI = output least squares?
I different error measures, domains - time vs. Fourier vs.

Laplace, L1, logarithmic - other talks today, survey Virieux &
Operto 09
I model extensions - migration velocity analysis as a
linearization, nonlinear MVA
Agenda
Overview
Challenges for FWI
Extended modeling
Summary
Nonlinear Challenges: Why low frequencies are important
Well-established observation, based on heuristic arguments (“cycle

skipping”), numerical evidence :forward modeling operator is more
linear [objective function is more quadratic] at lower frequencies
Leads to widely-used frequency continuation strategy (Kolb,

Collino, & Lailly 86)
Why?
Visualizing the shape of the objective: scan from model m0 to
model m1
f (h) = JLS [(1 − h)m0 + hm1 ]
Expl: data = simulation of Marmousi data (Versteeg & Gray 91),

with bandpass filter source.
offset (km) offset (km)
0 2 4 6 8 0 2 4 6 8
0 0
0.5 0.5
1.0 1.0
depth (km)
depth (km)
1.5 1.5
2.0 2.0
2.5 2.5
20 40 60 20 40 60
bulk modulus (GPa) bulk modulus (GPa)
m0 = smoothed Marmousi, m1 = Marmousi (bulk modulus

displayed)
1.0
MS error (normalized)
0.5
0
0 0.2 0.4 0.6 0.8 1.0
h (scan parameter)
Red: [2,5,40,50] Hz data. Blue: [2,4,8,12] Hz data
Origin of this phenomenon in math of symmetric hyperbolic

systems:
∂u
A + Pu = f
∂t
u = dynamical field vector, A = symm. positive operator, P =
skew-symm. differential operator in space variables, f = source
Example: for acoustics, u = (p, v)T , A = diag(1/κ, ρ), and

0 div
P=
grad 0
Theoretical development, including non-smooth A: Blazek, Stolk

& S. 08, Stolk 00, after Bamberger, Chavent & Lailly 79, Lions 68.
Sketch of linearization analysis - after Lavrientiev, Romanov, &

Shishatski 79, also Ramm 86:
δu = perturbation in dynamical fields corresponding to

perturbation δA in parameters
∂δu ∂u
A + Pδu = −δA
∂t ∂t
and
∂ ∂u ∂f
A +P =
∂t ∂t ∂t
Similarly for linearization error - h > 0, uh = fields corresponding

to A + hδA,
uh − u
e= − δu
h
∂e ∂
A + Pe = −δA (uh − u)
∂t ∂t
∂ 2 uh

∂ ∂
A +P (uh − u) = −hδA 2
∂t ∂t ∂t
2
∂2f

∂ ∂ uh
(A + hδA) + P =
∂t ∂t 2 ∂t 2
Use causal Green’s (inverse) operator:

−1 −1
∂ ∂ ∂f
δu = − A + P δA A + P
∂t ∂t ∂t
−1 −1 −1
∂2f

∂ ∂ ∂
e = −h A + P δA A + P δA (A + hδA) + P
∂t ∂t ∂t ∂t 2
pass to frequency domain:
ˆ = −[−iωA + P]−1 δA[−iωA + P]−1 iω fˆ
δu
ê = −h[−iωA+P]−1 δA[−iωA+P]−1 δA[−iωA+P]−1 (−iω)2 fˆ+O(h2 ω 2 )

So for small ω,
ˆ = iωP −1 δAP −1 fˆ + O(ω 2 )
δu
ê = hω 2 P −1 δAP −1 δAP −1 fˆ + O(ω 3 )

fˆ(0) 6= 0 ⇒ there exist δA for which
I P −1 δAP −1 fˆ 6= 0 - δA is resolved at zero frequency
⇒ for such δA
I (energy in e) < O(kδAkhωi) (energy in δu)
So: linearization error is small ⇒ JLS is near-quadratic, for

sufficiently low frequency source and/or sufficiently small δA.
Further analysis: quadratic directions ∼ large-scale features

Linear Challenges: Why reflection is hard
Relative difficulty of reflection vs. transmission
I numerical examples: Gauthier, Virieux & Tarantola 86

I spectral analysis of layered traveltime tomography: Baek &
Demanet 11
Spectral analysis of reflection per se: Virieux & Operto 09

Reproduction of “Camembert” Example (GVT 86) (thanks: Dong

Sun)
Circular high-velocity zone in 1km × 1km square background - 2%

∆v .
Transmission configuration: 8 sources at corners and side

midpoints, 400 receivers (100 per side) surround anomaly.
Reflection configuration: all 8 sources, 100 receivers on one side

(“top”).
Modeling details: 50 Hz Ricker source pulse, density fixed and

constant, staggered grid FD modeling, absorbing boundaries.
Transmission inversion, 2% anomaly: Initial MS resid = 2.56×107 ;
Final after 5 LBFGS steps = 2.6×105
x (km) x (km)
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
0 0
0.2 0.2
0.4 0.4
z (km)
z (km)
0.6 0.6
0.8 0.8
x10 4 x10 4
2.45 2.50 2.55 2.60 2.45 2.50 2.55 2.60
MPa MPa
Bulk modulus: Left, model; Right, inverted

Reflection configuration: initial MS resid = 3629; final after 5
LBFGS steps = 254
x (km) x (km)
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
0 0
0.2 0.2
0.4 0.4
z (km)
z (km)
0.6 0.6
0.8 0.8
x10 4 x10 4
2.45 2.50 2.55 2.60 2.45 2.50 2.55 2.60
MPa MPa

Message: in reflection case, “the Camembert has melted”.
Small anomaly ⇒ linear phenomenon
Linear resolution analysis (eg. Virieux & Operto 09): narrow

aperture data does not resolve low spatial wavenumbers
Resolution analysis of phase (traveltime tomography) in layered

case: Baek & Demanet 11
I model 7→ traveltime map = composition of (i) increasing

rearangement, (ii) invertible algebraic tranformation, (iii)
linear operator
I factor (iii) has singular values decaying like n−1/2 for diving
wave traveltimes, expontially decaying for reflected wave
traveltimes.
Putting it all together: “Large” Camembert (20% anomaly) with
0-60 Hz lowpass filter source. Continuation in frequency after
Kolb, Collino & Lailly 86 - 5 stages, starting with 0-2 Hz:
x (km) x (km)
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
0 0
0.2 0.2
0.4 0.4
z (km)
z (km)
0.6 0.6
0.8 0.8
x10 4 x10 4
2.2 2.3 2.4 2.5 2.6 2.7 2.2 2.3 2.4 2.5 2.6 2.7
MPa MPa

Agenda
Overview
Challenges for FWI
Extended modeling
Summary
Extended Models and Differential Semblance
Inversion of reflection data: difficulty rel. transmission is linear in
origin, so look to migration velocity analysis for useful ideas
Prestack migration as approximate inversion: fits subsets of data

with non-physical extended models (= image volume), so all data
matched - no serendipitous local matches!
Transfer info from small to large scales by demanding coherence of

extended models
Familiar concept from depth-domain migration velocity analysis -

independent models (images) grouped together as image gathers,
coherence ⇒ good velocity model
Exploit for automatic model estimation: residual moveout removal

(Biondi & Sava 04, Biondi & Zhang 12), van Leeuwen & Mulder 08
(data domain VA), differential waveform inversion (Chauris, poster
session), differential semblance (image domain VA) S. 86 ...
Differential semblance, version 1:
I group data d into gathers d(s) that can be fit perfectly (more
or less), indexed by s ∈ S (source posn, offset, slowness,...)
I extended models M̄ = {m̄ : S → M}
I extended modeling F̄ : M̄ → D by
F̄ [m̄](s) = F [m(s)]
I s finely sampled ⇒ coherence criterion is ∂ m̄/∂s = 0.
The DS objective:
2
2 2 ∂ m̄

JDS = kF̄ [m̄] − dk + σ + ...
∂s
Continuation method (σ : 0 → ∞) - theoretical justification

Gockenbach, Tapia & S. ’95, limits to JLS as σ → ∞.
“Starting” problem: σ → 0, minimizing JDS equivalent to
∂ m̄ 2

min
subj to F̄ [m̄] ' d
m̄ ∂s
Relation to MVA:
I separate scales: m0 = macro velocity model (physical), δm =

short scale reflectivity model
I linearize: m̄ = m0 + δ m̄, F̄ [m̄] ' F [m0 ] + D F̄ [m0 ]δ m̄
I approximate inversion of δd = d − F [m0 ] by migration:
δ m̄ = D F̄ [m0 ]−1 (d − F [m0 ]) ' D F̄ [m0 ]T (d − F [m0 ])
⇒ MVA via optimization:
2
h i
∂ T
min D F̄ [m 0 ] (d − F [m 0 ])]
m0 ∂s
Many implementations with various approximations of D F̄ T ,

choices of s: S. & collaborators early 90’s - present, Chauris-Noble
01, Mulder-Plessix 02, de Hoop & collaborators 03-07.
Bottom line: works well when hypotheses are satisfied:

linearization (no multiples), scale separation (no salt), simple
kinematics (no multipathing)
Nonlinear DS with LF control
Drop scale separation, linearization assumptions
Cannot use independent long-scale model as control, as in MVA:

“low spatial frequency” not well defined, depends on velocity.
However, temporal passband is well-defined, and lacks very low

frequency energy (0-3, 0-5,... Hz) with good s/n
Generally, inversion is unambiguous if data d is not band-limited

(good s/n to 0 Hz) - F̄ is nearly one-to-one - extended models m̄
fitting same data d differ by tradeoff between params, controllable
by DS term
So: find a way to supply the low-frequency data, as ersatz for

long-scale model - in fact, generate from auxiliary model!
Define low-frequency source complementary to data passband,

low-frequency (extended) modeling op Fl (F̄l )
Given low frequency control model ml ∈ M, define extended model

m̄ = m̄[d, ml ] by minimizing over m̄
2
2 2 ∂ m̄

JDS [m̄; d, ml ] = kF̄ [m̄] + F̄l [m̄] − (d + Fl [ml ])]k + σ
∂s
Determine ml ⇒ minimize
2
∂
JLF [d, ml ] = m̄[d, ml ]

∂s
(NB: nested optimizations!)

2
∂
min JLF [d, ml ] = m̄[d, ml ]
ml ∂s
ml plays same role as migration velocity model, but no
linearization, scale separation assumed
m̄[d, ml ] analogous to prestack migrated image volume
Initial exploration: Dong Sun PhD thesis, SEG 12, plane wave 2D
modeling, simple layered examples, steepest descent with quadratic
backtrack.
Greatest challenge: efficient and accurate computation of gradient

= solution of auxiliary LS problem
Example: DS Inversion with LF control, free surface
x (km)
0 2
0
0.2
z (km)
0.4
x10 4
0.6 0.8 1.0 1.2
MPa
Three layer bulk modulus model. Top surface pressure free, other
boundaries absorbing
sign(p)*p^2 (s^2/km^2)
-0.09 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08
0
time (s)
0.5
1.0
-1000 -500 0 500 1000
Plane wave data, free surface case

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
0
0.1
0.2
z (km)
0.3
0.4
0.5
-500 0 500
MPa
Extended model LS gradient at homog initial model (prestack

image volume)
-0.09 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08
0
0.1
0.2
z (km)
0.3
0.4
0.5
x10 4
0.2 0.4 0.6 0.8 1.0 1.2
MPa
Inverted gather m̄[d, ml ], ml = homogeneous model, x = 1.5 km

Inverted gather m̄[d, ml ], 3rd DS iteration, x = 1.5 km

Standard FWI using stack of optimal DS m̄ as initial data
(one-step homotopy σ = 0 → ∞)
153 L-BFGS iterations, final RMS error = 6%, final gradient norm
< 1 % of original
Space Shift DS
Defect in version 1 of DS already known in MVA context:
Image gathers generated from individual surface data

bins may not be flat, even when migration velocity is
optimally chosen (Nolan & S, 97, Stolk & S 04)
Source of kinematic artifacts obstructing flatness: multiple ray

paths connecting sources, receivers with reflection points.
Therefore version 1 of DS only suitable for mild lateral

heterogeneity. Must use something else to identify complex
refracting structures
Space Shift DS
For MVA, remedy is known: use space-shift image gathers δ m̄ (de
Hoop, Stolk & S 09)
Claerbout’s imaging principle (71): velocity is correct if energy in

δ m̄(x, h) is focused at h = 0 (h = subsurface offset)
Quantitative measure of focus: choose P(h) so that P(0) = 0,

P(h) > 0 if h 6= 0, minimize
X
|P(h)δ m̄[m0 ](x, h)|2
x,h
(e. g. P(h) = |h|).
MVA based on this principle by Shen, Stolk, & S. 03, Shen et al.
05, Albertin 06, 11, Kubir et al. 07, Fei & Williamson 09, 10, Tang
& Biondi 11, others - survey in Shen & S 08. Gradient issues: Fei
& Williamson 09, Vyas 09.
Space Shift DS
Extension to nonlinear problems - how is δ m̄[x, h] the output of an

adjoint derivative?
Answer: ReplaceR coefficients m in wave equation with operators m̄:

e. g. κ̄[u](x) = dhκ̄(x, h)u(x + h). Physical case: multiplication
operators κ̄(x, h) = κ(x)δ(h). Then
δ m̄[m0 ] = D F̄ [m̄0 ]T (d − F [m])
for resulting extended fwd map F̄
⇒ Version 2 of nonlinear DS. Physical case =

no-action-at-a-distance principle of continuum mechanics =
nonlinear version of Claerbout’s imaging principle (S, 08).
Mathematical foundation: Blazek, Stolk & S. 08.
Agenda
Overview
Challenges for FWI
Extended modeling
Summary
Summary
I restriction to low frequency data makes FWI objective more

quadratic, just like you always thought
I transmission inversion is easier than reflection for linear
reasons, so MVA seems like a good place to look for reflection
inversion approaches
I extended modeling provides a formalism for expressing MVA
objectives that extend naturally to nonlinear FWI, via
continuation - provision of starting models, route to FWI
solution
I positive early experience with “gather flattening” nonlinear
differential semblance
I “survey sinking” NDS involves wave equations with operator
coefficients
I Patrick’s fingerprints are all over this subject
Thanks to...
I Florence Delprat and other organizers, EAGE

I my students and postdocs, particularly Dong Sun, Peng Shen,
Chris Stolk, Kirk Blazek, Joakim Blanch, Cliff Nolan, Sue
Minkoff, Mark Gockenbach, Roelof Versteeg, Michel Kern
I National Science Foundation
I Sponsors of The Rice Inversion Project
I Patrick Lailly, for inspired, inspiring, and fundamental
contributions to this field

Objective Functions For Full Waveform Inversion: William Symes

Uploaded by

Copyright:

Available Formats

Objective Functions For Full Waveform Inversion: William Symes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Objective Functions For Full Waveform Inversion: William Symes

Uploaded by

Copyright:

Available Formats

Objective functions for full waveform inversion

The Rice Inversion Project

Challenges for FWI

M = model space, D = data space

Least squares inversion (“FWI”): given d ∈ D, find m ∈ M to

JLS [m] = kF [m] − dk2 [+ regularizing terms]

+ accommodates any modeling physics, data geometry, spatial

I low frequencies important?

What alternatives to Standard FWI = output least squares?

I different error measures, domains - time vs. Fourier vs.

Challenges for FWI

Well-established observation, based on heuristic arguments (“cycle

Leads to widely-used frequency continuation strategy (Kolb,

Expl: data = simulation of Marmousi data (Versteeg & Gray 91),

m0 = smoothed Marmousi, m1 = Marmousi (bulk modulus

Origin of this phenomenon in math of symmetric hyperbolic

Example: for acoustics, u = (p, v)T , A = diag(1/κ, ρ), and

Theoretical development, including non-smooth A: Blazek, Stolk

Sketch of linearization analysis - after Lavrientiev, Romanov, &

δu = perturbation in dynamical fields corresponding to

Similarly for linearization error - h > 0, uh = fields corresponding

Use causal Green’s (inverse) operator:

ê = −h[−iωA+P]−1 δA[−iωA+P]−1 δA[−iωA+P]−1 (−iω)2 fˆ+O(h2 ω 2 )

ê = hω 2 P −1 δAP −1 δAP −1 fˆ + O(ω 3 )

I P −1 δAP −1 fˆ 6= 0 - δA is resolved at zero frequency

I (energy in e) < O(kδAkhωi) (energy in δu)

So: linearization error is small ⇒ JLS is near-quadratic, for

Further analysis: quadratic directions ∼ large-scale features

Relative difficulty of reflection vs. transmission

I numerical examples: Gauthier, Virieux & Tarantola 86

Spectral analysis of reflection per se: Virieux & Operto 09

Reproduction of “Camembert” Example (GVT 86) (thanks: Dong

Circular high-velocity zone in 1km × 1km square background - 2%

Transmission configuration: 8 sources at corners and side

Reflection configuration: all 8 sources, 100 receivers on one side

Modeling details: 50 Hz Ricker source pulse, density fixed and

Bulk modulus: Left, model; Right, inverted

Bulk modulus: Left, model; Right, inverted

Message: in reflection case, “the Camembert has melted”.

Small anomaly ⇒ linear phenomenon

Linear resolution analysis (eg. Virieux & Operto 09): narrow

Resolution analysis of phase (traveltime tomography) in layered

I model 7→ traveltime map = composition of (i) increasing

Bulk modulus: Left, model; Right, inverted

Challenges for FWI

Prestack migration as approximate inversion: fits subsets of data

Transfer info from small to large scales by demanding coherence of

Familiar concept from depth-domain migration velocity analysis -

Exploit for automatic model estimation: residual moveout removal

I s finely sampled ⇒ coherence criterion is ∂ m̄/∂s = 0.

Continuation method (σ : 0 → ∞) - theoretical justification

“Starting” problem: σ → 0, minimizing JDS equivalent to

I separate scales: m0 = macro velocity model (physical), δm =

⇒ MVA via optimization:

Many implementations with various approximations of D F̄ T ,

Bottom line: works well when hypotheses are satisfied:

Drop scale separation, linearization assumptions

Cannot use independent long-scale model as control, as in MVA: