Real Image Denoising With Feature Attention

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Real Image Denoising with Feature Attention

Saeed Anwar ∗, Nick Barnes †


Data61, CSIRO and The Australian National University, Australia.

Abstract

Deep convolutional neural networks perform better


on images containing spatially invariant noise (synthetic
noise); however, their performance is limited on real-noisy
photographs and requires multiple stage network model- Noisy CBDNet [31] RIDNet (Ours)
ing. To advance the practicability of denoising algorithms,
this paper proposes a novel single-stage blind real image Figure 1. A real noisy face image from RNI15 dataset [38]. Un-
denoising network (RIDNet) by employing a modular ar- like CBDNet [31], RIDNet does not have over-smoothing or over-
chitecture. We use a residual on the residual structure to contrasting artifacts (Best viewed in color on high-resolution dis-
ease the flow of low-frequency information and apply fea- play)
ture attention to exploit the channel dependencies. Further- On the other hand, discriminative learning aims to model
more, the evaluation in terms of quantitative metrics and vi- the image prior from a set of noisy and ground-truth image
sual quality on three synthetic and four real noisy datasets sets. One technique is to learn the prior in steps in the con-
against 19 state-of-the-art algorithms demonstrate the su- text of truncated inference [17] while another approach is
periority of our RIDNet. to employ brute force learning, for example, MLP [14] and
CNN methods [63, 64]. CNN models [65, 31] improved
denoising performance, due to their modeling capacity, net-
1. Introduction work training, and design. However, the performance of the
Image denoising is a low-level vision task that is essen- current learning models is limited and tailored for a specific
tial in a number of ways. First of all, during image acqui- level of noise.
sition, some noise corruption is inevitable and can down- A practical denoising algorithm should be efficient, flex-
grade the visual quality considerably; therefore, removing ible, perform denoising using a single model and handle
noise from the acquired image is a key step for many com- both spatially variant and invariant noise when the noise
puter vision and image analysis applications [28]. Sec- standard-deviation is known or unknown. Unfortunately,
ondly, denoising is a unique testing ground for evaluat- the current state-of-the-art algorithms are far from achiev-
ing image prior and optimization methods from a Bayesian ing all of these aims. We present a CNN model which is
perspective [30, 67]. Furthermore, many image restora- efficient and capable of handling synthetic as well as real-
tion tasks can be solved in the unrolled inference through noise present in images. We summarize the contributions of
variable splitting methods by a set of denoising subtasks, this work in the following paragraphs.
which further widens the applicability of image denois-
1.1. Contributions
ing [3, 33, 51, 64].
Generally, denoising algorithms can be categorized as • Present CNN based approaches for real image denois-
model-based and learning-based. Model-based algorithms ing employ two-stage models; we present the first
include non-local self-similarity (NSS) [18, 13, 20], spar- model that provides state-of-the-art results using only
sity [30, 48], gradient methods [46, 56, 54], Markov random one stage.
field models [52], and external denoising priors [9, 61, 42].
The model-based algorithms are computationally expen- • To best of our knowledge, our model is the first to in-
sive, time-consuming, unable to suppress the spatially vari- corporate feature attention in denoising.
ant noise directly and characterize complex image textures. • Most current models connect the weight layers con-
∗ B: saeed.anwar@csiro.au secutively; and so increasing the depth will not help
† B: nick.barnes@csiro.au improve performance [21, 41]. Also, such networks

13155
can suffer from vanishing gradients [11]. We present Although the results of TRND are favorable, the model re-
a modular network, where increasing the number of quires a significant amount of data to learn the parame-
modules helps improve performance. ters and influence functions as well as overall fine-tuning,
hyper-parameter determination, and stage-wise training.
• We experiment on three synthetic image datasets and Similarly, non-local color net (NLNet) [39] was motivated
four real-image noise datasets to show that our model by non-local self-similar (NSS) priors which employ non-
achieves state-of-the-art results on synthetic and real local self-similarity coupled with discriminative learning.
images quantitatively and qualitatively. NLNet improved upon the traditional methods; but, it lags
in performance compared to most of the CNNs [64, 63] due
2. Related Works to the adaptaton of NSS priors, as it is unable to find the
analogs for all the patches in the image.
In this section, we present and discuss recent trends in
the image denoising. Two notable denoising algorithms, Building upon the success of DnCNN [63], Jiao et
NLM [13] and BM3D [18], use self-similar patches. Due al. proposed a network consisting of two stacked sub-
to their success, many variants were proposed, including nets, named “FormattingNet” and “DiffResNet” respec-
SADCT [27], SAPCA [20], NLB [37], and INLM [29] tively. The architecture of both networks is similar, and
which seek self-similar patches in different transform do- the difference lies in the loss layers used. The first sub-
mains. Dictionary-based methods [25, 43, 22] enforce spar- net employs total variational and perceptual loss while the
sity by employing self-similar patches and learning over- second one uses ℓ2 loss. The overall model is named as
complete dictionaries from clean images. Many algorithms FormResNet and improves upon [64, 63] by a small mar-
[67, 26, 59] investigated the maximum likelihood algorithm gin. Lately, Bae et al. [10] employed persistent homology
to learn a statistical prior, e.g. the Gaussian Mixture Model analysis [24] via wavelet transformed domain to learn the
of natural patches or patch groups for patch restoration. Fur- features in CNN denoising. The performance of the model
thermore, Levin et al. [40] and Chatterjee et al. [16], moti- is marginally better compared to [63, 35], which can be at-
vated external denoising [9, 7, 42, 62] by showing that an tributed to a large number of feature maps employed rather
image can be recovered with negligible error by selecting than the model itself. Recently, Anwar et al. introduced
reference patches from a clean external database. However, CIMM, a deep denoising CNN architecture, composed of
all of the external algorithms are class-specific. identity mapping modules [8]. The network learns features
Recently, Schmidt et al. [53] introduced a cascade of in cascaded identity modules using dilated kernels and uses
shrinkage fields (CSF) which integrated half-quadratic op- self-ensemble to boost performance. CIMM improved upon
timization and random-fields. Shrinkage aims to suppress all the previous CNN models [63, 35].
smaller values (noise values) and learn mappings discrim- Recently, many algorithms focused on blind denoising
inatively. The CSF assumes the data fidelity term to be on real-noisy images [50, 31, 12]. The algorithms [64, 63,
quadratic and that it has a discrete Fourier transform based 35] benefitted from the modeling capacity of CNNs and
closed-form solution. have shown the ability to learn a single-blind denoising
Currently, due to the popularity of convolutional neural model; however, the denoising performance is limited, and
networks (CNNs), image denoising algorithms [63, 64, 39, the results are not satisfactory on real photographs. Gen-
14, 53, 8] have achieved a performance boost. Notable de- erally speaking, real-noisy image denoising is a two-step
noising neural networks, DnCNN [63], and IrCNN [64] pre- process: the first involves noise estimation while the second
dict the residue present in the image instead of the denoised addresses non-blind denoising. Noise clinic (NC) [38] esti-
image as the input to the loss function is ground truth noise mates the noise model dependent on signal and frequency
as compared to the original clean image. Both networks followed by denoising the image using non-local Bayes
achieved better results despite having a simple architecture (NLB). In comparison, Zhang et al. [65] proposed a non-
where repeated blocks of convolutional, batch normaliza- blind Gaussian denoising network, termed FFDNet that can
tion and ReLU activations are used. Furthermore, IrCNN produce satisfying results on some of the real noisy im-
[64] and DnCNN [63] are dependent on blindly predicted ages; however, it requires manual intervention to select high
noise i.e. without taking into account the underlying struc- noise-level.
tures and textures of the noisy image. Very recently, CBDNet [31] trains a blind denoising
Another essential image restoration framework is Train- model for real photographs. CBDNet [31] is composed of
able Nonlinear Reaction-Diffusion (TRND) [17] which two subnetworks: noise estimation and non-blind denois-
uses a field-of-experts prior [52] into the deep neural net- ing. CBDNet [31] also incorporated multiple losses, is engi-
work for a specific number of inference steps by extending neered to train on real-synthetic noise and real-image noise
the non-linear diffusion paradigm into a profoundly train- and enforces a higher noise standard deviation for low noise
able parametrized linear filters and the influence functions. images. Furthermore, [31, 65] may require manual inter-

3156
EAM EAM EAM EAM
Convolutional Layer
followed by ReLU

𝜎 Sigmoid Function

Element-wise addition
𝜎
Element-wise multiplication

Global Pooling

Figure 2. The architecture of the proposed network. Different green colors of the conv layers denote different dilations while the smaller
size of the conv layer means the kernel is 1 × 1. The second row shows the architecture of each EAM.
vention to improve results. On the other hand, we present an networks [35, 31] employs more than one loss to optimize
end-to-end architecture that learns the noise and produces the model, contrary to earlier networks, we only employ
results on real noisy images without requiring separate sub- one loss i.e. ℓ1 . Now, given a batch of N training pairs,
nets or manual intervention. {xi , yi }N
i=1 , where x is the noisy input and y is the ground
truth, the aim is to minimize the ℓ1 loss function as
3. CNN Denoiser
1 X
N

3.1. Network Architecture L(W) = ||RIDNet(xi ) − yi ||1 , (4)


N i=1
Our model is composed of three main modules i.e. fea-
ture extraction, feature learning residual on the residual where RIDNet(·) is our network and W denotes the set of
module, and reconstruction, as shown in Figure 2. Let us all the network parameters learned. Our feature extraction
consider x is a noisy input image and ŷ is the denoised out- Me and reconstruction module Mr resemble the previous
put image. Our feature extraction module is composed of algorithms [21, 8]. We now focus on the feature learning
only one convolutional layer to extract initial features f0 residual on the residual block, and feature attention.
from the noisy input:
3.2. Feature learning Residual on the Residual
f0 = Me (x), (1) In this section, we provide more details on the enhance-
ment attention modules that uses a Residual on the Residual
where Me (·) performs convolution on the noisy input im- structure with local skip and short skip connections. Each
age. Next, f0 is passed on to the feature learning residual EAM is further composed of D blocks followed by fea-
on the residual module, termed as Mf l , ture attention. Due to the residual on the residual archi-
tecture, very deep networks are now possible that improve
fr = Mf l (f0 ), (2)
denoising performance; however, we restrict our model to
where fr are the learned features and Mf l (·) is the main fea- four EAM modules only. The first part of EAM covers
ture learning residual on the residual component, composed the full receptive field of input features, followed by learn-
of enhancement attention modules (EAM) that are cascaded ing on the features; then the features are compressed for
together as shown in Figure 2. Our network has small depth, speed, and finally a feature attention module enhances the
but provides a wide receptive field through kernel dilation in weights of important features from the maps. The first part
each EAM initial two branch convolutions. The output fea- of EAM is realized using a novel merge-and-run unit as
tures of the final layer are fed to the reconstruction module, shown in Figure 2 second row. The input features branched
which is again composed of one convolutional layer. and are passed through two dilated convolutions, then con-
catenated and passed through another convolution. Next,
the features are learned using a residual block of two con-
ŷ = Mr (fr ), (3)
volutions while compression is achieved by an enhanced
where Mr (·) denotes the reconstruction layer. residual block (ERB) of three convolutional layers. The last
There are several choices available as loss function for layer of ERB flattens the features by applying a 1×1 kernel.
optimization such as ℓ2 [63, 64, 8], perceptual loss [35, 31], Finally, the output of the feature attention unit is added to
total variation loss [35] and asymmetric loss [31]. Some the input of EAM.

3157
can be considered as having low-frequency regions (smooth
or flat areas), and high-frequency regions (e.g., lines edges
and texture). As convolutional layers exploit local informa-
tion only and are unable to utilize global contextual infor-
mation, we first employ global average pooling to express
Figure 3. The feature attention mechanism for selecting the essen-
the statistics denoting the whole image, other options for
tial features. aggregation of the features can also be explored to repre-
sent the image descriptor. Let fc be the output features of
the last convolutional layer having c feature maps of size
In image recognition, residual blocks [32] are stacked h × w; global average pooling will reduce the size from
together to construct a network of more than 1000 layers. h × w × c to 1 × 1 × c as:
Similarly, in image superresolution, EDSR [41] stacked the
residual blocks and used long skip connections (LSC) to
1 XX
h w
form a very deep network. However, to date, very deep gp = fc (i, j), (7)
networks have not been investigated for denoising. Moti- h × w i=1 i=1
vated by the success of [66], we introduce the residual on
where fc (i, j) is the feature value at position (i, j) in the
the residual as a basic module for our network to construct
feature maps.
deeper systems. Now consider the m-th module of the EAM
Furthermore as investigated in [34], we propose a self-
is given as
gating mechanism to capture the channel dependencies
fm = EAMm (EAM m − 1(· · · (M0 (f0 )) · · · )), (5) from the descriptor retrieved by global average pooling. Ac-
cording to [34], the mentioned mechanism must learn the
where fm is the output of the EAMm feature learning nonlinear synergies between channels as well as mutually-
module, in other words fm = EAMm (fm−1 ). The out- exclusive relationships. Here, we employ soft-shrinkage
put of each EAM is added to the input of the group as and sigmoid functions to implement the gating mechanism.
fm = fm + fm−1 . We have observed that simply cascading Let us consider δ, and α are the soft-shrinkage and sigmoid
the residual modules will not achieve better performance, operators, respectively. Then the gating mechanism is
instead we add the input of the feature extractor module to
the final output of the stacked modules as rc = α(HU (δ(HD (gp )))), (8)

fg = f0 + Mf l (Ww,b ), (6) where HD and HU are the channel reduction and chan-
nel upsampling operators, respectively. The output of the
where Ww,b are the weights and biases learned in the group. global pooling layer gp is convolved with a downsampling
This addition i.e. LSC, eases the flow of information across Conv layer, activated by the soft-shrinkage function. To dif-
groups. fg is passed to reconstruction layer to output the ferentiate the channel features, the output is then fed into
same number of channels as the input of the network. Fur- an upsampling Conv layer followed by sigmoid activation.
thermore, we use another long skip connection to add the Moreover, to compute the statistics, the output of the sig-
input image to the network output i.e. ŷ = Mr (fg ) + x, in moid (rc ) is adaptively rescaled by the input fc of the chan-
order to learn the residual (noise) rather than the denoised nel features as
image, as this technique helps in faster learning as com- fˆc = rc × fc (9)
pared to learning original image due to the sparse represen-
tation of the noise. 3.3. Implementation
Our proposed model contains four EAM blocks. The
3.2.1 Feature Attention
kernel size for each convolutional layer is set to 3 × 3, ex-
This section provides information about the feature atten- cept the last Conv layer in the enhanced residual block and
tion mechanism. Attention [60] has been around for some those of the features attention units, where the kernel size
time; however, it has not been employed in image de- is 1 × 1. Zero padding is used for 3 × 3 to achieve the
noising. Channel features in image denoising methods are same size outputs feature maps. The number of channels
treated equally, which is not appropriate for many cases. To for each convolutional layer is fixed at 64, except for fea-
exploit and learn the critical content of the image, we focus ture attention downscaling. A factor of 16 reduces these
attention on the relationship between the channel features; Conv layers; hence having only four feature maps. The final
hence the name: feature attention (see Figure 3). convolutional layer either outputs three or one feature maps
An important question here is how to generate attention depending on the input. As for running time, our method
differently for each channel-wise feature. Images generally takes about 0.2 second to process a 512 × 512 image.

3158
Long skip connection (LSC) X X X X
Short skip connection (SSC) X X X X X
Long connection (LC) X X X
Feature attention (FA) X X X X X
PSNR (in dB) 28.45 28.77 28.81 28.86 28.52 28.85 28.86 28.90 28.96
Table 1. Investigation of skip connections and feature attention. The best result in PSNR (dB) on values on BSD68 [52] in 2×105 iterations
is presented.
4. Experiments 4.3. Comparisons
4.1. Training settings We evaluate our algorithm using the Peak Signal-
to-Noise Ratio (PSNR) index as the error metric and
To generate noisy synthetic images, we employ compare against many state-of-the-art competitive algo-
BSD500 [44], DIV2K [4], and MIT-Adobe FiveK [15], re- rithms which include traditional methods i.e. CBM3D [19],
sulting in 4k images while for real noisy images, we use WNNM [30], EPLL [67], CSF [53] and CNN-based denois-
cropped patches of 512×512 from SSID [1], Poly [55], and ers i.e. MLP [14], TNRD [17], DnCNN [63], IrCNN [64],
RENOIR [6]. Data augmentation is performed on training CNLNet [39], FFDNet [65] and CBDNet [31]. To be fair
images, which includes random rotations of 90◦ , 180◦ , 270◦ in comparison, we use the default setting of the traditional
and flipping horizontally. In each training batch, 32 patches methods provided by the corresponding authors.
are extracted as inputs with a size of 80 × 80. Adam [36]
is used as the optimizer with default parameters. The learn- 4.3.1 Test Datasets
ing rate is initially set to 10−4 and then halved after 105
iterations. The network is implemented in the Pytorch [47] In the experiments, we test four noisy real-world datasets
framework and trained with an Nvidia Tesla V100 GPU. i.e. RNI15 [38], DND [49], Nam [45] and SSID [1]. Fur-
Furthermore, we use PSNR as evaluation metric. thermore, we prepare three synthetic noisy datasets from
the widely used 12 classical images, BSD68 [52] color and
gray 68 images for testing. We corrupt the clean images by
4.2. Ablation Studies
additive white Gaussian noise using noise sigma of 15, 25
4.2.1 Influence of the skip connections and 50 standard deviations.
• RNI15 [38] provides 15 real-world noisy images. Un-
Skip connections play a crucial role in our network. Here, fortunately, the clean images are not given for this
we demonstrate the effectiveness of the skip connections. dataset; therefore, only the qualitative comparison is
Our model is composed of three basic types of connections presented for this dataset.
which includes long skip connection (LSC), short skip con-
nections (SSC), and local connections (LC). Table 1 shows • Nam [45] comprises of 11 static scenes and the cor-
the average PSNR for the BSD68 [52] dataset. The highest responding noise-free images obtained by the mean of
performance is obtained when all the skip connections are 500 noisy images of the same scene. The size of the
available while the performance is lower when any connec- images are enormous; hence, we cropped the images
tion is absent. We also observed that increasing the depth in 512 × 512 patches and randomly selected 110 from
of the network in the absence of skip connections does not those for testing.
benefit performance.
• DnD is recently proposed by Plotz et al. [49] which
originally contains 50 pairs of real-world noisy and
noise-free scenes. The scenes are further cropped into
4.2.2 Feature-attention patches of size 512 × 512 by the providers of the
dataset which resulted in 1000 smaller images. The
Another important aspect of our network is feature atten- near noise-free images are not publicly available, and
tion. Table 1 compares the PSNR values of the networks the results (PSNR/SSIM) can only be obtained through
with and without feature attention. The results support our the online system introduced by [49].
claim about the benefit of using feature attention. Since the
inception of DnCNN [63], the CNN models have matured, • SSID [1] (Smartphone Image Denoising Dataset) is re-
and further performance improvement requires the careful cently introduced. The authors have collected 30k real
design of blocks and rescaling of the feature maps. The two noisy images and their corresponding clean images;
mentioned characteristics are present in our model in the however, only 320 images are released for training and
form of feature-attention and the skip connections. 1280 images pairs for validation, as testing images are

3159
Noise Methods
Level BM3D WNNM EPLL TNRD DenoiseNet DnCNN IrCNN NLNet FFDNet Ours
15 31.08 31.32 31.19 31.42 31.44 31.73 31.63 31.52 31.63 31.81
25 28.57 28.83 28.68 28.92 29.04 29.23 29.15 29.03 29.23 29.34
50 25.62 25.83 25.67 26.01 26.06 26.23 26.19 26.07 26.29 26.40
Table 2. The similarity between the denoised and the clean images of BSD68 dataset [52] for our method and competing measured in terms
of average PSNR for σ=15, 25, and 50 on grayscale images.

Methods σ = 15 σ = 25 σ = 50
BM3D [18] 32.37 29.97 26.72
WNNM [30] 32.70 30.26 27.05
EPLL [67] 32.14 29.69 26.47
MLP [14] - 30.03 26.78
CSF [53] 32.32 29.84 - 31.68dB 32.21dB
TNRD [17] 32.50 30.06 26.81 Noisy BM3D [19] IRCNN [64]
DnCNN [63] 32.86 30.44 27.18
IrCNN [64] 32.77 30.38 27.14
FFDNet [65] 32.75 30.43 27.32
Ours 32.91 30.60 27.43
Table 3. The quantitative comparison between denoising algo-
32.33dB 32.84dB
rithms on 12 classical images, (in terms of PSNR). The best results
are highlighted as bold.
DnCNN [63] Ours GT
Figure 4. Denoising performance of our RIDNet versus state-of-
the-art methods on a color images from [52] for σn = 50
not released yet. We will use the validation images for
testing our algorithm and the competitive methods. network generates textures closest to the ground-truth with
fewer artifacts and more details.
4.3.2 Grayscale noisy images
In this subsection, we evaluate our model on the noisy 4.3.4 Real-World noisy images
grayscale images corrupted by spatially invariant addi- To further assess the practicality of our model, we employ a
tive white Gaussian noise. We compare against nonlo- real noise dataset. The evaluation is difficult because of the
cal self-similarity representative models i.e. BM3D [18] unknown level of noise, the various noise sources such as
and WNNM [30], learning based methods i.e. EPLL, shot noise, quantization noise etc., imaging pipeline i.e. im-
TNRD [17], MLP [14], DnCNN [63], IrCNN [64], and age resizing, lossy compression etc. Furthermore, the noise
CSF [53]. In Tables 3 and 2, we present the PSNR val- is spatially variant (non-Gaussian) and also signal depen-
ues on Set12 and BSD68. It is to be remembered here that dent; hence, the assumption that noise is spatially invariant,
BSD500 [44] and BSD68 [52] are two disjoint sets. Our employed by many algorithms does not hold for real image
method outperforms all the competitive algorithms on both noise. Therefore, real-noisy images evaluation determines
datasets for all noise levels; this may be due to the larger the success of the algorithms in real-world applications.
receptive field as well as better modeling capacity.

DnD: Table 5 presents the quantitative results


4.3.3 Color noisy images
(PSNR/SSIM) on the sRGB data for competitive al-
Next, for noisy color image denoising, we keep all the pa- gorithms and our method obtained from the online DnD
rameters of the network similar to the grayscale model, ex- benchmark website available publicly. The blind Gaussian
cept the first and last layer are changed to input and output denoiser DnCNN [63] performs inefficiently and is unable
three channels rather than one. Figure 4 presents the visual to achieve better results than BM3D [18] and WNNM [30]
comparison and Table 4 reports the PSNR numbers between due to the poor generalization of the noise during training.
our methods and the alternative algorithms. Our algorithm Similarly, the non-blind Gaussian traditional denoisers
consistently outperforms all the other techniques published are able to report limited performance, although the noise
in Table 4 for CBSD68 dataset [52]. Similarly, our net- standard-deviation is provided. This may be due to the fact
work produces the best perceptual quality images as shown that these denoisers [18, 30, 67] are tailored for AWGN
in Figure 4. A closer inspection on the vase reveals that our only and real-noise is different in characteristics to syn-

3160
Noise Methods
Levels CBM3D [19] MLP [14] TNRD [17] DnCNN [63] IrCNN [64] CNLNet [39] FFDNet [65] Ours
15 33.50 - 31.37 33.89 33.86 33.69 33.87 34.01
25 30.69 28.92 28.88 31.33 31.16 30.96 31.21 31.37
50 27.37 26.00 25.94 27.97 27.86 27.64 27.96 28.14
Table 4. Performance comparison between our network and existing state-of-the-art algorithms on the color version of the BSD68
dataset [52].

30.896dB 29.98dB 30.73dB 29.42dB


Noisy CBM3D [18] WNNM [30] NC [38] TWSC [57]

30.88dB 28.43dB 31.37dB 31.06dB 32.31dB


Noisy Image MCWNNM [58] NI [2] FFDNet [65] CBDNet [31] RIDNet (Ours)
Figure 5. A real noisy example from DND dataset [49] for comparison of our method against the state-of-the-art algorithms.

Method Blind/Non-blind PSNR SSIM


CDnCNNB [63] Blind 32.43 0.7900
EPLL [67] Non-blind 33.51 0.8244
TNRD [17] Non-blind 33.65 0.8306
NCSR [23] Non-blind 34.05 0.8351
MLP [14] Non-blind 34.23 0.8331 Noisy FFDNet CBDNet RIDNet
FFDNet [65] Non-blind 34.40 0.8474
Figure 6. Comparison of our method against the other methods
BM3D [18] Non-blind 34.51 0.8507
on a real image from RNI15 [38] benchmark containing spatially
FoE [52] Non-blind 34.62 0.8845 variant noise.
WNNM [30] Non-blind 34.67 0.8646
NC [38] Blind 35.43 0.8841
NI [2] Blind 35.11 0.8778 (i.e. total variation, ℓ2 and asymmetric learning) and both
CIMM [8] Non-blind 36.04 0.9136 real-noise as well as synthetically generated real-noise.
KSVD [5] Non-blind 36.49 0.8978 As reported by the author of CBDNet [31], it is able to
MCWNNM [58] Non-blind 37.38 0.9294 achieve 37.72 dB with real-noise images only. Noise
TWSC [57] Non-blind 37.96 0.9416 Clinic (NC) [38] and Neat Image (NI) [2] are the other two
FFDNet+ [65] Non-blind 37.61 0.9415 state-of-the-art blind denoisers other than [31]. NI [2] is
CBDNet [31] Blind 38.06 0.9421 commercially available as a part of Photoshop and Corel
RIDNET (Ours) Blind 39.23 0.9526 PaintShop. Our network is able to achieve 3.82dB and
4.14dB more PSNR from NC [38] and NI [2], respectively.
Table 5. The Mean PSNR and SSIM denoising results of state-of-
the-art algorithms evaluated on the DnD sRGB images [49] Next, we visually compare the result of our method with
the competing methods on the denoised images provided
thetic noise. Incorporating feature attention and capturing by the online system of Plotz et al. [49] in Figure 5. The
the appropriate characteristics of the noise through a novel PSNR and SSIM values are also taken from the website.
module means our algorithm leads by large margin i.e. From Figure 5, it is clear that the methods of [31, 65, 63]
1.17dB PSNR compared to the second performing method, perform poorly in removing the noise from the star and in
CBDNet [31]. Furthermore, our algorithm only employs some cases the image is over-smoothed, on the other hand,
real-noisy images for training using only ℓ1 loss while our algorithm can eliminate the noise while preserving the
CBDNet [31] uses many techniques such as multiple losses finer details and structures in the star image.

3161
Noisy DnCNN FFDNet Ours Noisy CBM3D (39.13) IRCNN (33.73)
Figure 7. A real high noise example from RNI15 dataset [38]. Our
method is able to remove the noise in textured and smooth areas
without introducing artifacts.
Methods
Datasets BM3D DnCNN FFDNet CBDNet Ours
Nam [45] 37.30 35.55 38.7 39.01 39.09
SSID [1] 30.88 26.21 29.20 30.78 38.71
DnCNN (37.56) CBDNet (40.40) Ours (40.50)
Table 6. The quantitative results (in PSNR (dB)) for the SSID [1]
and Nam [45] datasets. Figure 8. An image from Nam dataset [45] with JPEG compres-
sion. CBDNet is trained explicitly on JPEG compressed images;
RNI15: On RNI15 [38], we provide qualitative images still, our method performed better.
only as the ground-truth images are not available. Figure 6
presents the denoising results on a low noise intensity im-
age. FFDNet [65] and CBDNet [31] are unable to remove
the noise in its totality as can been seen near the bottom
left of handle and body of the cup image. On the contrary,
our method is able to remove the noise without the intro- 25.75 dB 21.97 dB 20.76 dB
duction of any artifacts. We present another example from Noisy CBM3D IRCNN DnCNN
the RNI15 dataset [38] with high noise in Figure 7. CD-
nCNN [63] and FFDNet [65] produce results of limited na-
ture as some noisy elements can be seen in the near the eye
and gloves of the Dog image. In comparison, our algorithm
recovers the actual texture and structures without compro-
mising on the removal of noise from the images. 19.70 dB 28.84 dB 35.57 dB
FFDNet CBDNet Ours GT
Nam: We present the average PSNR scores of the resul- Figure 9. A challenging example from SSID dataset [1]. Our
tant denoised images in Table 6. Unlike CBDNet [31], method can remove noise and restore true colors.
which is trained on Nam [45] to specifically deal with the
JPEG compression, we use the same network to denoise are unable to restore original colors and in specific regions
the Nam images [45] and achieve favorable PSNR numbers. induce false colors.
Our performance in terms of PSNR is higher than any of the
current state-of-the-art algorithms. Furthermore, our claim
5. Conclusion
is supported by the visual quality of the images produced In this paper, we present a new CNN denoising model
by our model as shown in Figure 8. The amount of noise for synthetic noise and real noisy photographs. Unlike pre-
present after denoising by our method is negligible as com- vious algorithms, our model is a single-blind denoising net-
pared to CDnCNN and other counterparts. work for real noisy images. We propose a novel restoration
module to learn the features and to enhance the capability
SSID: As a last dataset, we employ the SSID real noise of the network further; we adopt feature attention to rescale
dataset which has the highest number of test (validation) im- the channel-wise features by taking into account the depen-
ages available. The results in terms of PSNR are shown in dencies between the channels. We also use LSC, SSC, and
the second row of Table 6. Again, it is clear that our method SC to allow low-frequency information to bypass so the net-
outperforms FFDNet [65] and CBDNet [31] by a margin of work can focus on residual learning. Extensive experiments
9.5dB and 7.93dB, respectively. In Figure 9, we show the on three synthetic and four real-noise datasets demonstrate
denoised results of a challenging image by different algo- the effectiveness of our proposed model.
rithms. Our technique recovers the true colors which are This work was supported in part by NH&MRC Project
closer to the original pixel values while competing methods grant # 1082358.

3162
References [20] K. Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen
Egiazarian. BM3D image denoising with shape-adaptive
[1] Abdelrahman Abdelhamed, Stephen Lin, and Michael S principal component analysis. In Signal Processing with
Brown. A high-quality denoising dataset for smartphone Adaptive Sparse Structured Representations, 2009. 1, 2
cameras. In CVPR, 2018. 5, 8 [21] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou
[2] ABSoft. Neat image. 7 Tang. Image super-resolution using deep convolutional net-
[3] Manya V Afonso, José M Bioucas-Dias, and Mário AT works. TPAMI, 2016. 1, 3
Figueiredo. Fast image recovery using variable splitting and [22] Weisheng Dong, Xin Li, D. Zhang, and Guangming Shi.
constrained optimization. TIP, 2010. 1 Sparsity-based image denoising via dictionary learning and
[4] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge structural clustering. In CVPR, 2011. 2
on single image super-resolution: Dataset and study. In [23] Weisheng Dong, Lei Zhang, Guangming Shi, and Xin
CVPR Workshops, 2017. 5 Li. Nonlocally centralized sparse representation for image
[5] Michal Aharon, Michael Elad, and Alfred Bruckstein. K- restoration. TIP, 2012. 7
svd: An algorithm for designing overcomplete dictionaries [24] Herbert Edelsbrunner and John Harer. Persistent homology-a
for sparse representation. TIP, 2006. 7 survey. Contemporary mathematics, 2008. 2
[6] Josue Anaya and Adrian Barbu. Renoir–a dataset for real [25] Michael Elad and Dmitry Datsenko. Example-based regular-
low-light image noise reduction. Journal of Visual Commu- ization deployed to super-resolution reconstruction of a sin-
nication and Image Representation, 2018. 5 gle image. Comput. J., 2009. 2
[7] Saeed Anwar, C Huynh, and Fatih Porikli. Combined in- [26] L. Zhang F. Chen and H. Yu. External Patch Prior Guided
ternal and external category-specific image denoising. In Internal Clustering for Image Denoising. In ICCV, 2015. 2
BMVC, 2017. 2 [27] Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian.
[8] Saeed Anwar, Cong Phouc Huynh, and Fatih Porikli. Chain- Pointwise shape-adaptive DCT for high-quality denoising
ing identity mapping modules for image denoising. arXiv and deblocking of grayscale and color images. TIP, 2007.
preprint arXiv:1712.02933, 2017. 2, 3, 7 2
[9] Saeed Anwar, Fatih Porikli, and Cong Phuoc Huynh. [28] Rafael C Gonzalez and Paul Wintz. Digital image process-
Category-specific object image denoising. TIP, 2017. 1, 2 ing (book). Reading, Mass., Addison-Wesley Publishing Co.,
[10] Woong Bae, Jaejun Yoo, and Jong Chul Ye. Beyond deep Inc.(Applied Mathematics and Computation, 1977. 1
residual learning for image restoration: Persistent homology- [29] Bart Goossens, Hiêp Luong, Aleksandra Pizurica, and Wil-
guided manifold simplification. In CVPR Workshops, 2017. fried Philips. An improved non-local denoising algorithm.
2 In IP, 2008. 2
[11] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learn- [30] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu
ing long-term dependencies with gradient descent is difficult. Feng. Weighted nuclear norm minimization with application
TNN, 1994. 2 to image denoising. In CVPR, 2014. 1, 5, 6, 7
[31] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei
[12] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen,
Zhang. Toward convolutional blind denoising of real pho-
Dillon Sharlet, and Jonathan T Barron. Unprocessing im-
tographs. arXiv preprint arXiv:1807.04686, 2018. 1, 2, 3, 5,
ages for learned raw denoising. In CVPR, 2019. 2
7, 8
[13] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A
[32] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
non-local algorithm for image denoising. In CVPR, 2005. 1,
Deep residual learning for image recognition. In CVPR,
2
2016. 4
[14] Harold Christopher Burger, Christian J Schuler, and Stefan [33] Felix Heide, Markus Steinberger, Yun-Ta Tsai, Mushfiqur
Harmeling. Image denoising: Can plain neural networks Rouf, Dawid Pajak, Dikpal Reddy, Orazio Gallo, Jing Liu,
compete with bm3d? In CVPR, 2012. 1, 2, 5, 6, 7 Wolfgang Heidrich, Karen Egiazarian, et al. Flexisp: A flex-
[15] Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Frédo ible camera image processing framework. TOG, 2014. 1
Durand. Learning photographic global tonal adjustment with [34] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation net-
a database of input/output image pairs. In CVPR, 2011. 5 works. In CVPR, 2018. 4
[16] P. Chatterjee and P. Milanfar. Is denoising dead? TIP, 2010. [35] Jianbo Jiao, Wei-Chih Tu, Shengfeng He, and Rynson WH
2 Lau. Formresnet: Formatted residual learning for image
[17] Yunjin Chen and Thomas Pock. Trainable nonlinear reaction restoration. In CVPR Workshops, 2017. 2, 3
diffusion: A flexible framework for fast and effective image [36] Diederik P Kingma and Jimmy Ba. Adam: A method for
restoration. TPAMI, 2017. 1, 2, 5, 6, 7 stochastic optimization. arXiv preprint arXiv:1412.6980,
[18] Kostadin Dabov, Alessandro F., Vladimir Katkovnik, and 2014. 5
Karen Egiazarian. Image denoising by sparse 3-D transform- [37] M Lebrun, Antoni Buades, and Jean-Michel Morel. A non-
domain collaborative filtering. 2007. 1, 2, 6, 7 local bayesian image denoising algorithm. SIAM Journal on
[19] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Imaging Sciences, 2013. 2
Karen Egiazarian. Color image denoising via sparse 3-D [38] Marc Lebrun, Miguel Colom, and Jean-Michel Morel. The
collaborative filtering with grouping constraint in luminance- noise clinic: a blind image denoising algorithm. IPOL, 2015.
chrominance space. In ICIP, 2007. 5, 6, 7 1, 2, 5, 7, 8

3163
[39] Stamatios Lefkimmiatis. Non-local color image denoising [59] Jun Xu, Lei Zhang, Wangmeng Zuo, David Zhang, and Xi-
with convolutional neural networks. CVPR, 2016. 2, 5, 7 angchu Feng. Patch Group Based Nonlocal Self-Similarity
[40] A. Levin and B. Nadler. Natural image denoising: Optimal- Prior Learning for Image Denoising. In ICCV, 2015. 2
ity and inherent bounds. In CVPR, 2011. 2 [60] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
[41] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua
Kyoung Mu Lee. Enhanced deep residual networks for single Bengio. Show, attend and tell: Neural image caption gen-
image super-resolution. In CVPR workshops, 2017. 1, 4 eration with visual attention. In ICML, 2015. 4
[42] Enming Luo, Stanley H Chan, and Truong Q Nguyen. Adap- [61] H. Yue, X. Sun, J. Yang, and F. Wu. Cid: Combined im-
tive image denoising by targeted databases. TIP, 2015. 1, 2 age denoising in spatial and frequency domains using web
[43] Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, images. In CVPR, June 2014. 1
and Andrew Zisserman. Non-local sparse models for image [62] H. Yue, X. Sun, J. Yang, and F. Wu. Image denoising by
restoration. In ICCV, 2009. 2 exploring external and internal correlations. TIP, 2015. 2
[44] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database [63] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
of human segmented natural images and its application to Lei Zhang. Beyond a gaussian denoiser: Residual learning
evaluating segmentation algorithms and measuring ecologi- of deep cnn for image denoising. TIP, 2017. 1, 2, 3, 5, 6, 7,
cal statistics. In ICCV, 2001. 5, 6 8
[45] Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita, [64] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang.
and Seon Joo Kim. A holistic approach to cross-channel im- Learning deep cnn denoiser prior for image restoration.
age noise modeling and its application to image denoising. CVPR, 2017. 1, 2, 3, 5, 6, 7
In CVPR, 2016. 5, 8
[65] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward
[46] Stanley Osher, Martin Burger, Donald Goldfarb, Jinjun Xu, a fast and flexible solution for cnn-based image denoising.
and Wotao Yin. An iterative regularization method for total TIP, 2018. 1, 2, 5, 6, 7, 8
variation-based image restoration. Multiscale Modeling &
[66] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng
Simulation, 2005. 1
Zhong, and Yun Fu. Image super-resolution using very
[47] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
deep residual channel attention networks. arXiv preprint
Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-
arXiv:1807.02758, 2018. 4
ban Desmaison, Luca Antiga, and Adam Lerer. Automatic
[67] Daniel Zoran and Yair Weiss. From learning models of natu-
differentiation in pytorch. 2017. 5
ral image patches to whole image restoration. In ICCV, 2011.
[48] Yigang Peng, Arvind Ganesh, John Wright, Wenli Xu, and
1, 2, 5, 6, 7
Yi Ma. Rasl: Robust alignment by sparse and low-rank de-
composition for linearly correlated images. TPAMI, 2012.
1
[49] Tobias Plötz and Stefan Roth. Benchmarking denois-
ing algorithms with real photographs. arXiv preprint
arXiv:1707.01313, 2017. 5, 7
[50] Tobias Plötz and Stefan Roth. Neural nearest neighbors net-
works. In NIPS, 2018. 2
[51] Yaniv Romano, Michael Elad, and Peyman Milanfar. The
little engine that could: Regularization by denoising (red).
SIAM Journal on Imaging Sciences, 2017. 1
[52] Stefan Roth and Michael J Black. Fields of experts. IJCV,
2009. 1, 2, 5, 6, 7
[53] Uwe Schmidt and Stefan Roth. Shrinkage fields for effective
image restoration. In CVPR, 2014. 2, 5, 6
[54] Yair Weiss and William T Freeman. What makes a good
model of natural images? In CVPR, 2007. 1
[55] Jun Xu, Hui Li, Zhetong Liang, David Zhang, and Lei
Zhang. Real-world noisy image denoising: A new bench-
mark. arXiv preprint arXiv:1804.02603, 2018. 5
[56] Jinjun Xu and Stanley Osher. Iterative regularization and
nonlinear inverse scale space applied to wavelet-based de-
noising. TIP, 2007. 1
[57] Jun Xu, Lei Zhang, and David Zhang. A trilateral weighted
sparse coding scheme for real-world image denoising. In
ECCV, 2018. 7
[58] Jun Xu, Lei Zhang, David Zhang, and Xiangchu Feng.
Multi-channel weighted nuclear norm minimization for real
color image denoising. In ICCV, 2017. 7

3164

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy