Video Denoising Using Sparse and Redundant Representations: (Ijartet) Vol. 1, Issue 3, November 2014
Video Denoising Using Sparse and Redundant Representations: (Ijartet) Vol. 1, Issue 3, November 2014
Video Denoising Using Sparse and Redundant Representations: (Ijartet) Vol. 1, Issue 3, November 2014
Assistant Professor
Department of Electronics and Communication Engineering,
Sri Ramakrishna Engineering College, 2Sri Eashwar Engineering College, Coimbatore, Tamil Nadu, India
Abstract: The quality of video sequences is often reduced by noise, usually assumed white and Gaussian, being
superimposed on the sequence. When denoising image sequences, rather than a single image, the temporal dimension can
be used for gaining in better denoising performance, as well as in the algorithms' speed. Such correlations are further
strengthened with a motion compensation process, for which a Fourier domain noise-robust cross correlation algorithm is
proposed for motion estimation. This algorithm relies on sparse and redundant representations of small patches in the
images. Three different extensions are offered, and all are tested and found to lead to substantial benefits both in denoising
quality and algorithm complexity, compared to running the single image algorithm sequentially.
Keywords: Cross Correlation (CC), Motion estimation, K-SVD, OMP, Sparse representations, Video denoising.
I. INTRODUCTION
VIDEO SIGNALS are often contaminated by noise
during acquisition and transmission. Removing/reducing
noise in video signals (or video denoising) is highly
desirable, as it can enhance perceived image quality,
increase compression effectiveness, facilitate transmission
bandwidth reduction, and improve the accuracy of the
possible subsequent processes such as feature extraction,
object detection, motion tracking and pattern classification.
Video denoising algorithms may be roughly classified based
on two different criteria: whether they are implemented in
the spatial domain or transform domain and whether motion
information is directly incorporated. The high degree of
correlation between adjacent frames is a blessing in
disguise for signal restoration. On the one hand, since
additional information is available from nearby frames, a
better estimate of the original signal is expected. On the
other hand, the process is complicated by the presence of
motion between frames. Motion estimation itself is a
complex problem and it is further complicated by the
presence of noise.One suggested approach that utilizes the
temporal redundancy is motion estimation [4]. The estimated
trajectories are used to filter along the temporal dimension,
either in the wavelet domain [5] or the signal domain [4].
Spatial filtering may also be used, with stronger emphasis in
areas in which the motion estimation is not as reliable. A
18
(2)
The first term demands a proximity between the
measured image, Y, and its denoised (and unknown) version
X. The second term demands that each patch from the
reconstructed image (denoted by RijX) can be represented up
to a bounded error by a dictionary D, with coefficients ij .
The third part demands that the number of coefficients
required to represent any patch is small. The values ij are
patch-specific weights. Minimizing this functional with
respect to its unknowns yields the denoising algorithm.
The choice of D is of high importance to the
performance of the algorithm. In [1] it is shown that training
can be done by minimizing (2) with respect toD as well (in
addition to X and ij). The proposed algorithm in [1] is an
iterative block-coordinate relaxation method, that fixes all
the unknowns apart from the one to be updated, and
alternates between the following update stages.
1) Update of the sparse representations {ij} : Assuming that
D and X are fixed, we solve a set of problems of the form
19
(3)
(5)
per each location [i,j]. This means that we seek for each
patch in the image the sparsest vector to describe it using
atoms from D. In [1], the orthogonal matching pursuit
(OMP) algorithm is used for this task .
2) Update the dictionary D: In this stage, we assume that X
is fixed, and we update one atom at a time in D, while also
updating the coefficients in {ij}ij that use it. This is done
via a rank-one approximation of a residual matrix, as
described in [16].
3) Update the estimated image X : After several rounds of
updates of {ij}ij and D ,the final output image is computed
by fixing these unknowns and minimizing (2) with respect to
X . This leads to the quadratic problem
(6)
Defined for t=1,2,..,T.
(i.e., the patches are taken from three frames).
To gauge the possible speed-up and improvement
achieved by temporally adaptive and propagated dictionary,
we test the required number of iterations to obtain similar
results to the non propagation alternative. Several options for
the number of training iterations that follow the dictionary
propagation are compared to the non propagation (using 15
training iterations per image) option. Dictionary is trained
after propagating the dictionary (from image #10), using 4
training iterations for each image. This comparison shows
20
that propagation of the dictionary leads to a cleaner version normalized in the frequency domain to have unit energy
with clearer and sharper texture atoms. These benefits are across all frequencies. The phase correlation function is
attributed to the memory induced by the propagation. given by
Indeed, when handling longer sequences, we expect this
memory feature of our algorithm to further benefit in
denoising performance.
(10)
At high noise levels, the redundancy factor (the
ratio between the number of atoms in the dictionary to the
size of an atom) should be smaller. At high noise levels,
To have a close look, let us assume that F2(v) is
obtaining a clean dictionary requires averaging of a large simply a shifted version of F1(v), i.e., F2(v) =F1(vv).
number of patches for each atom. This is why only a Based on the shifting property of the Fourier transform, we
relatively small number of atoms is used. At low noise have F2() = F1() exp{jTv} and Y() = |F1()|2
levels, many details in the image need to be represented by exp{jTv}, and thus
the dictionary. Noise averaging takes a more minor role in
this case. This calls for a large number of atoms, so they can
(11)
represent the wealth of details in the image.
IV. NOISE-ROBUST MOTION ESTIMATION
One of the challenges in the implementation of the
above algorithm is to estimate motion in the presence of
noise. Here, we propose a simple but reliable noise-robust
CC method[3] for global motion estimation at integer pixel
precision. The limitation of using global motion estimation
is that it cannot account for rotation, zooming and local
motion. Let F1(v) and F2(v) represent two image frames,
where v is a spatial integer index vector for the underlying 2D rectangular image lattice. A classical approach to
estimating a global motion vector between the two frames is
the cross correlation method which is based on the
observation that when F2(v) is a shifted version of F1(v), the
position of the peak in the CC function between F1(v) and
F2(u) corresponds to the motion vector. Despite the
simplicity of the idea, the computation of the CC function is
often costly. An equivalent but more efficient approach is to
use the Fourier transform method: Let F() = F {f (v)}
represents the 2-D Fourier transform of an image frame,
where F denotes the Fourier transform operator. Then, the
CC function can be computed as
(12)
Where |N() |2 is the noise power spectrum (in the
case of white noise, |N() |2 is a constant). To better
understand this, it is useful to formulate the three approaches
(PC, CC, and NRCC) using a unified framework.
In
particular, each method can be viewed a specific weighting
scheme in the Fourier domain
(13)
where the differences lie in the definition of the weighting
function W()
(7)
(8)
The estimated motion vector is given by
(14)
(9)
21
(15)
here L is the dynamic range of the image (for 8 bits/pixel
images, L = 255) and MSE is the mean squared error
between the original and distorted images.
SSIM is first calculated within local windows using
(16)
here x and y are the image patches extracted from the local
window from the original and distorted images, respectively.
x, x2, and xy are the mean, variance, and cross-correlation
computed within the local window, respectively. The overall
SSIM score of a video frame is computed as the average
local SSIM scores.
Besides the running time of this method can fill the
requirements of the real time processing. To testify the
advantage of our method, the comparison between present
and K_SVD method is shown in Fig. 1.
(a1)
TABLE I
PSNR COMPARISONS WITH LATEST VIDEO DENOISING ALGORITHMS
REFERENCES
[1].
Noise
std()
K_SVD[1]
K_SVD[2]
VBM3D
Proposed
10
15
20
50
35.89
37.95
38.33
38.01
33.30
35.17
36.6
35.98
31.80
33.33
35.12
34.23
24.12
26.03
28.49
27.31
VI. CONCLUSION
(a2)
22
23