Spatial and Temporal Linear Filters: 6.8300/6.8301 Advances in Computer Vision

Lecture 4
Spatial and Temporal

Linear Filters
Spring 2024
6.8300/6.8301 Advances in Computer Vision Sara Beery, Kaiming He, Vincent Sitzmann, Mina Konaković Luković
Announcements
• Pset 1 is out (due next Thu)
• Lecture recordings, lecture slides, course notes available at https://advances-in-

vision.github.io/schedule.html
• Register for Piazza and Canvas (participation on Piazza rewarded)
• Max 3 person team for nal project
• Check new tutorial rooms on webpage/Canvas
fi
Remember, an image is just an array of numbers
What we see What the machine gets
Some visual areas…
From M. Lewicky
Slides from lewicki

Signals and systems
Input Output
One important class of systems is the set of linear systems.
A function f is linear if it satis es:
f(αx) = αf(x)
f(x + y) = f(x) + f(y)
fi
We need translation invariance
Now we also want translation invariant operations. We can have linear translation invariant and non-linear translation invariant. We will focus on linear translation invariant
Classifier “Bird”
Classifier “Bird”
Bird
Bird
Classifier “Sky”
Sky
Sky Sky Sky Sky Sky Sky Sky Bird
Sky Sky Sky Sky Sky Sky Sky Sky
f
<latexit sha1_base64="b4HLEbhr7TEtaehBb4ygFYyuiV8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipGQ7KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGNn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNiUbgrf68jppX1U9t+o1ryv12zyOIpzBOVyCBzWowz00oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AyvGM6g==</latexit>
Sky Sky Sky Sky Sky Sky Sky Sky
Bird Bird Bird Sky Bird Sky Sky Sky
Sky Sky Sky Bird Sky Sky Sky Sky

Convolution
Convolution
Input Image Output Image
The same weighting occurs within each window

Fourier series
Any function f(t) with t in the interval (0,π) could be expressed as an in nite linear
combination of harmonically related sinusoids:
with
One of Fourier’s original examples of sine series is the expansion of the ramp signal:
14
fi
The Discrete Fourier Transform
Discrete Fourier Transform (DFT) transforms a signal f [n] into F [u] as:
( N)
N−1
un
∑
F[u] = f[n] exp −2πj
n=0
The inverse of the DFT is:
( N)
1 N−1 un
N∑
f[n] = F[u] exp 2πj
u=0
The signal f [n] is a weighted linear combination of complex exponentials

with weights F [u]
15
Visualizing the image Fourier transform
( (N M ))
N−1 M−1
un vm
∑∑
F[u, v] = f[n, m] exp −2πj +
n=0 m=0
The values of F [u,v] are complex.
Using the real and imaginary components:
Or using a polar decomposition:
Amplitude Phase 16
Visualizing the image Fourier transform
f [n, m] F[u, v]
m v v
n u u
v
v
u u
17
Location information goes into the phase, strength goes into magnitude
Simple Fourier transforms
Images are 64x64 pixels. The wave is a cosine, therefore DFT phase is zero.
18
High frequency means further fro m the origin

Today: A collection of useful filters in space and
time, and aliasing.
Low-pass filters Band-pass filters

19
Low pass-filters
Box filter
2N+1
1 1 … 1 h[n] with N=1
1 1 1
1 1 1 2M+1
…
1 1 1 1 n=0 n
21
Box filter
mean
1
21X21
=
mean
256X256 256X256
What does it do?

• Replaces each pixel with an average of its neighborhood
• Achieve smoothing effect (remove sharp features)
22
3-tap box filter versus b2[n]
[1 1 1] [1 2 1]
Which one is a better low-pass filter?

23
2 + 2 cos(2π u/20)
1 + 2 cos(2π u/20)
24
h1[n] vs b2[n]
[1, 1, 1] […, 1, -1, 1, -1, 1, -1, …] = […, -1, 1, -1, 1, -1, 1, …]
[1, 2, 1] […, 1, -1, 1, -1, 1, -1, …] = […, 0, 0, 0, 0, 0, 0, …]
25
Gaussian filter
In the continuous domain:
26
Gaussian filter
Continuous Gaussian:
Discretization of the Gaussian:
27
Scale
28
Gaussian filter for low-pass filtering
29
Dali
Properties of the Gaussian filter
• The (continuous) Fourier transform of a Gaussian is another Gaussian
• The convolution of two n-dimensional Gaussians is an n-dimensional

Gaussian.
where the variance of the result is the sum
(it is easy to prove this using the FT of the Gaussian) 30

Binomial filter
• Binomial coefficients provide a compact approximation of

the Gaussian coefficients using only integers
• The simplest blur filter (low pass) is [1 1]
• Binomial filters in the family of filters obtained as

successive convolutions of [1 1]
31
Binomial filter
b1 = [1 1]
b2 = [1 1] [1 1] = [1 2 1]
b3 = [1 1] [1 1] [1 1] = [1 3 3 1]
32
Binomial filter
33
Properties of binomial filters
• Sum of the values is 2n

• The variance of bn is
• The convolution of two binomial filters is
also a binomial filter
with a variance:
Note: These properties are analogous to the gaussian property in the continuous
domain (but the binomial filter is different than a discretization of a gaussian)
34
B2[n]
35
2D version by convolving 2 1D filters

What about the opposite of blurring?
Gaussian lter
Laplacian lter
+
-
-
36
Sharpening
fi
fi
Laplacian lter
+
-
Gaussian lter -
+ =
37
fi
fi
38
Our system responds to different special frequencies differently

Contrast Sensitivity Function
Blackmore & Campbell (1969)
Maximum sensitivity
~ 6 cycles / degree of visual angle
Invisible
Contrast sensitivity
visible
0.1 1 10 100
Low Spatial frequency (cycles/degree) High
Things that are very close Things far away

and/or large are hard to see are hard to see
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
When you’re far away you just see the low spatial frequencies, when you are close you see the high spatial frequencies
Hybrid Images
Oliva & Schyns
41
We start by taking two pictures, we isolate the details and contours in one, here the woman, and we blur the second face. as you can see, the man seems to go out of focus and the details of the woman are superposed.
Hybrid Images
42
+
=
43
When you’re far away you just see the low spatial frequencies, when you are close you see the high spatial frequencies
Hybrid Images
44
http://cvcl.mit.edu/hybrid_gallery/gallery.html
High pass-filters
47
Finding edges in the image
Image gradient:
Approximation image derivative:
Edge strength
Edge orientation:
Edge normal:
48
Image derivative [-1 1]
[-1, 1] =
h[m,n]
g[m,n] f[m,n]
49
[-1 1]T
[-1, 1]T =
h[m,n]
g[m,n] f[m,n]
50
Discrete derivatives
51
This version can give you a better centering. In the first example the edge would be at one half pixel
52
53
Derivatives
We want to compute the image derivative:
If there is noise, we might want to “smooth” it with a blurring filter
But derivatives and convolutions are linear and we can move them
around:
54
Gaussian derivatives
The continuous derivative is:
55
Gaussian Scale
σ=2 σ=4 σ=8

56
Analyze edges of the image (high-frequency edges) at a different scale

Derivatives of Gaussians: Scale
σ=2 σ=4 σ=8

57
Edges at a different spatial scale - not picking strips, picking just edges of the body
Orientation
58
Orientation
59
Sampling
You should always blur your image before you subsample it!
Sampling
Pixels
Continuous world
61
Sampling
62
Sampling
63
The question is: how many samples we need to capture the continuous thing well?
Aliasing
Let’s start with this continuous image (it is not really continuous…)
64
Aliasing
65
Subsample it (take every forth pixel or something)

Aliasing
Both waves fit the same samples. Aliasing consists in “perceiving”

the red wave when the actual input was the blue wave.
66
Red curve is the signal: sinusoid + constant signal baseband spectrum
replicated spectra
Blue shows sampled signal

spatial sampled signal frequency
domain domain
sampled at Nyquist frequency
67
The number of replicas is equal to the sampling frequency

replicated spectra

domain domain
68
replicated spectra

domain domain
69
replicated spectra

domain domain
70
replicated spectra

domain domain
aliased
components
71
High frequency appears to be a low frequency of another thing. To avoid this situation, you do a low pass lter rst, so you can capture at least 2 samples of your
sinusoid.
fi
fi
spatial
domain
frequency
domain
Aliasing
You repeat copies by the spacing that is related to the sampling frequency.
Antialising filtering
Before sampling, apply a low pass-filter to remove all the
frequencies that will produce aliasing: “blur before you
subsample"
Without antialising
filter.
With antialising
filter.
• Temporal filtering
• Motion illusion, involving aliasing, addressing
whether humans match spatial patterns, or use
temporal filters, to measure motion.
Temporal filtering
why filter videos over time?

Sequences
time
Sequences
Cube size = 128x128x90
Slope says how fast it’s moving

Sequences
Cube size = 128x128x90

A box moving with speed vx
f (t) vx
t
Global constant motion
(vx,vy)
A global motion of the image can be written as:
where:
Our Fourier signal is the original signal multiplied by the delta function.
Fourier transformation of a box is a sink function (sin x/x)
Fourier domain is just a different representation that makes certain operations simpler. Like defining motion - just calculate the orientation of the signal to keep track of the direction of motion.
Temporal Gaussian
This filter keeps stationary

things sharp, and blurs
moving things.
Filters remove some frequencies we don’t like and keep the ones we like.
Blurring more in time than in space with this filter.
Spatio-temporal Gaussian
Spatio-temporal Gaussian
How could we create a filter that keeps sharp objects that
move at some velocity (vx , vy) while blurring the rest?
We can design Gaussians that pick up a particular direction.

Grab stationary objects
Grab people moving to the left…
Quadrature pair of Gabor filters
U0=0.1
x 2 +y 2
− 2
ψ c (x, y) = e 2σ
cos(2πu0 x )
x 2 +y 2
− 2
ψ s (x, y) = e 2σ
sin(2πu0 x )
Gabor is a Gaussian wavelet multiplied by cosine

Using phase changes of local Gabor filters to
analyze or generate motion
x 2 +y 2
− 2
ψ c (x, y) = e 2σ
cos(2πu0 x )+φ t)
x 89
Space-time plot of the a slice through the
patio-temporal filter of the previous slide
x 2 +y 2
− 2
ψ c (x, y) = e 2σ
cos(2πu0 x )+φ t)
x
90
Orientation in space time is speed in real world

Mo on without movement
ti
Spatio-temporal sampling illusion, due to
Edward Adelson and Jim Bergen
92
Evidence for filter-based analysis of motion in the
human visual system shown via spatio-temporal
visual illusion based on sampling
Two potential theories for how humans compute our motion perceptions:
(a) We match the pattern in the image that we see at one moment and compare
it with what we see at subsequent times.
(b) We use spatio-temporal lters to measure spatio-temporal energy in order to
measure local motion.
This illusion favors one theory over the other.
93
fi
spatial frequency
temporal frequency
Visual signal (this “video” is static)
space
time
Static sinusoid
alpha: 1 squareFlag: 0 offset: 0
Start with a picture of a sin wave, a static wave that’s not moving in time
spatial frequency Visual signal
temporal frequency
Moving sinusoid
space
time
95
A square wave is an infinite sum of sinusoids
filters to analyze motion
Instead of looking at sinusoids we are going to use square waves
96
Add higher frequency sins, this is represented as a sum of a lot of sins

spatial frequency
Visual signal
temporal frequency
Moving square waves
space
time
97
spatial frequency Visual signal
temporal frequency
Jitter space
square time
wave
alpha: 1 squareFlag: 1 offset: period/4 98

spatial frequency
Visual signal
temporal frequency
space
Blur
time
99
alpha: 0 squareFlag: 1 offset: period/4
spatial frequency
Visual signal
temporal frequency
space
time
100
101
102
blend over the two conditions fraction of square wave
fundamental frequency
103
faster display speed
104
faster display speed
105
fast blended…
106
lecture summary
• We have “inverted U shaped” sensitivity to spatial
frequencies, peaking at 6 cycles per degree.
• We discussed ways to filter out different spatial frequency
components of an image.
• Aliasing: “blur before you subsample”.
• Spatio-temporal filtering enables motion analysis.
• Motion illusion gives evidence some temporal filtering
mechanisms are involved in our motion processing.
107

Spatial and Temporal Linear Filters: 6.8300/6.8301 Advances in Computer Vision

Uploaded by

Copyright:

Available Formats

Spatial and Temporal Linear Filters: 6.8300/6.8301 Advances in Computer Vision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial and Temporal Linear Filters: 6.8300/6.8301 Advances in Computer Vision

Uploaded by

Copyright:

Available Formats

Lecture 4

Spatial and Temporal

• Pset 1 is out (due next Thu)

• Lecture recordings, lecture slides, course notes available at https://advances-in-

• Register for Piazza and Canvas (participation on Piazza rewarded)

• Max 3 person team for nal project

• Check new tutorial rooms on webpage/Canvas

Slides from lewicki

One important class of systems is the set of linear systems.

A function f is linear if it satis es:

Sky Sky Sky Sky Sky Sky Sky Sky

Bird Bird Bird Sky Bird Sky Sky Sky

Sky Sky Sky Bird Sky Sky Sky Sky

The same weighting occurs within each window

The inverse of the DFT is:

The signal f [n] is a weighted linear combination of complex exponentials

The values of F [u,v] are complex.

Using the real and imaginary components:

Or using a polar decomposition:

High frequency means further fro m the origin

Low-pass filters Band-pass filters

What does it do?

Which one is a better low-pass filter?

[1, 1, 1] […, 1, -1, 1, -1, 1, -1, …] = […, -1, 1, -1, 1, -1, 1, …]

[1, 2, 1] […, 1, -1, 1, -1, 1, -1, …] = […, 0, 0, 0, 0, 0, 0, …]

Discretization of the Gaussian:

• The (continuous) Fourier transform of a Gaussian is another Gaussian

• The convolution of two n-dimensional Gaussians is an n-dimensional

where the variance of the result is the sum

(it is easy to prove this using the FT of the Gaussian) 30

• Binomial coefficients provide a compact approximation of

• The simplest blur filter (low pass) is [1 1]

• Binomial filters in the family of filters obtained as

• Sum of the values is 2n

2D version by convolving 2 1D filters

Our system responds to different special frequencies differently

Things that are very close Things far away

Approximation image derivative:

If there is noise, we might want to “smooth” it with a blurring filter

The continuous derivative is:

σ=2 σ=4 σ=8

Analyze edges of the image (high-frequency edges) at a different scale

σ=2 σ=4 σ=8

Subsample it (take every forth pixel or something)

Both waves fit the same samples. Aliasing consists in “perceiving”

Blue shows sampled signal

sampled at Nyquist frequency

The number of replicas is equal to the sampling frequency

Blue shows sampled signal

sampled at Nyquist frequency

Blue shows sampled signal

sampled at Nyquist frequency

Blue shows sampled signal

sampled at Nyquist frequency

Blue shows sampled signal

sampled at Nyquist frequency

why filter videos over time?

Cube size = 128x128x90

Slope says how fast it’s moving