Zhang Learning Deep CNN CVPR 2017 Paper
Zhang Learning Deep CNN CVPR 2017 Paper
3929
can treat the plain discriminative learning methods as gen- The contribution of this work is summarized as follows:
eral case of Eqn. (3). It can be seen that one obvious dif-
ference between model-based optimization method and dis- • We trained a set of fast and effective CNN denoiser-
criminative learning method is that, the former is flexible s. With variable splitting technique, the powerful de-
to handle various IR tasks by specifying degradation matrix noisers can bring strong image prior into model-based
H, whereas the later needs to use the training data with cer- optimization methods.
tain degradation matrices to learn the model. As a conse-
• The learned set of CNN denoisers are plugged in as
quence, different from model-based optimization methods
a modular part of model-based optimization method-
which have flexibility to handle different IR tasks, discrim-
s to tackle other inverse problems. Extensive experi-
inative learning methods are usually restricted by special-
ments on classical IR problems, including deblurring
ized tasks. For example, model-based optimization method-
and super-resolution, have demonstrated the merits of
s such as NCSR [22] are flexible to handle denoising, super-
integrating flexible model-based optimization methods
resolution and deblurring, whereas discriminative learning
and fast CNN-based discriminative learning methods.
methods MLP [8], SRCNN [21], DCNN [62] are designed
for those three tasks, respectively. Even for a specific task
2. Background
such as denoising, model-based optimization methods (e.g.,
BM3D [17] and WNNM [29]) can handle different noise 2.1. Image Restoration with Denoiser Prior
levels, whereas discriminative learning method of [34] sep-
arately train a different model for each level. There have been several attempts to incorporate denoiser
prior into model-based optimization methods to tackle with
With the sacrifice of flexibility, however, discriminative other inverse problems. In [19], the authors used Nash equi-
learning methods can not only enjoy a fast testing speed librium to derive an iterative decoupled deblurring BM3D
but also tend to deliver promising performance due to the (IDDBM3D) method for image debluring. In [24], a sim-
joint optimization and end-to-end training. On the con- ilar method which is equipped with CBM3D denoiser pri-
trary, model-based optimization methods are usually time- or was proposed for single image super-resolution (SISR).
consuming with sophisticated priors for the purpose of good By iteratively updating a back-projection step and a CB-
performance [27]. As a result, those two kinds of meth- M3D denoising step, the method has an encouraging per-
ods have their respective merits and drawbacks, and thus formance for its PSNR improvement over SRCNN [21].
it would be attractive to investigate their integration which In [18], the augmented Lagrangian method was adopted to
leverages their respective merits. Fortunately, with the aid fuse the BM3D denoiser into an image deblurring scheme.
of variable splitting techniques, such as alternating direc- With a similar iterative scheme to [19], a plug-and-play
tion method of multipliers (ADMM) method [5] and half- priors framework based on ADMM method was proposed
quadratic splitting (HQS) method [28], it is possible to deal in [61]. Here we note that, prior to [61], a similar idea
with fidelity term and regularization term separately [44], of plug-and-play is also mentioned in [66] where a half
and particularly, the regularization term only corresponds quadratic splitting (HQS) method was proposed for image
to a denoising subproblem [18, 31, 61]. Consequently, this denoising, deblurring and inpainting. In [31], the authors
enables an integration of any discriminative denoisers into used an alternative to ADMM and HQS, i.e., the primal-
model-based optimization methods. However, to the best of dual algorithm [11], to decouple fidelity term and regular-
our knowledge, the study of integration with discriminative ization term. Some of the other related work can be found
denoiser is still lacking. in [6, 12, 48, 49, 54, 58]. All the above methods have shown
This paper aims to train a set of fast and effective that the decouple of the fidelity term and regularization ter-
discriminative denoisers and integrate them into model- m can enable a wide variety of existing denoising models to
based optimization methods to solve other inverse problem- solve different image restoration tasks.
s. Rather than learning MAP inference guided discrimina- We can see that the denoiser prior can be plugged in an
tive models, we instead adopt plain convolutional neural iterative scheme via various ways. The common idea be-
networks (CNN) to learn the denoisers, so as to take ad- hind those ways is to decouple the fidelity term and reg-
vantage of recent progress in CNN as well as the merit of ularization term. For this reason, their iterative schemes
GPU computation. Particularly, several CNN techniques, generally involve a fidelity term related subproblem and a
including Rectifier Linear Units (ReLU) [37], batch nor- denoising subproblem. In the next subsection, we will use
malization [32], Adam [36], dilated convolution [63] are HQS method as an example due to its simplicity. It should
adopted into the network design or training. As well as pro- be noted that although the HQS can be viewed as a general
viding good performance for image denoising, the learned way to handle different image restoration tasks, one can al-
set of denoisers are plugged in a model-based optimization so incorporate the denoiser prior into other convenient and
method to tackle various inverse problems. proper optimization methods for a specific application.
3930
2.2. Half Quadratic Splitting (HQS) Method 3. Learning Deep CNN Denoiser Prior
Basically, to plug the denoiser prior into the optimiza- 3.1. Why Choose CNN Denoiser?
tion procedure of Eqn. (2), the variable splitting technique
is usually adopted to decouple the fidelity term and regular- As the regularization term of Eqn. (2) plays a vital role in
ization term. In half quadratic splitting method, by intro- restoration performance, the choice of denoiser priors thus
ducing an auxiliary variable z, Eqn. (2) can be reformulated would be pretty important in Eqn. (9). Existing denoiser
as a constrained optimization problem which is given by priors that have been adopted in model-based optimization
methods to solve other inverse problems include total varia-
1
x̂ = arg min ky − Hxk2 + λΦ(z) s.t. z = x (4) tion (TV) [10, 43], Gaussian mixture models (GMM) [66],
x2 K-SVD [25], non-local means [7] and BM3D [17]. Such
Then, HQS method tries to solve the following problem denoiser priors have their respective drawbacks. For exam-
1 µ ple, TV can create watercolor-like artifacts; K-SVD denois-
Lµ (x, z) = ky − Hxk2 + λΦ(z) + kz − xk2 (5) er prior suffers high computational burden; non-local mean-
2 2
s and BM3D denoiser priors may over-smooth the irregular
where µ is a penalty parameter which varies iteratively in a structures if the image does not exhibit self-similarity prop-
non-descending order. Eqn. (5) can be solved via the fol- erty. Thus, strong denoiser prior which can be implemented
lowing iterative scheme, efficiently is highly demanded.
2 2
xk+1 = arg min ky − Hxk + µkx − zk k (6a) Regardless of the speed and performance, color image
x
prior or denoiser is also a key factor that needs to be tak-
zk+1 = arg min µ kz − xk+1 k2 + λΦ(z) (6b) en into account. This is because most of the images ac-
z 2
quired by modern cameras or transmitted in internet are in
As one can see, the fidelity term and regularization term are RGB format. Due to the correlation between different color
decoupled into two individual subproblems. Specifically, channels, it has been acknowledged that jointly handling the
the fidelity term is associated with a quadratic regularized color channels tends to produce better performance than in-
least-squares problem (i.e., Eqn. (6a)) which has various dependently dealing with each color channel [26]. Howev-
fast solutions for different degradation matrices. A direct er, existing methods mainly focus on modeling gray image
solution is given by prior and there are only a few works concentrating on mod-
xk+1 = (HT H + µI)−1 (HT y + µzk ) (7) eling color image prior (see, e.g., [16, 41, 46]). Perhaps the
most successful color image prior modeling method is CB-
The regularization term is involved in Eqn. (6b) which can M3D [16]. It first decorrelates the image into a luminance-
be rewritten as chrominance color space by a hand-designed linear trans-
1 form and then applies the gray BM3D method in each trans-
zk+1 = arg min p kxk+1 − zk2 + Φ(z) (8) formed color channels. While CBM3D is promising for col-
z 2( λ/µ)2
or image denoising, it has been pointed out that the resulting
According to Bayesian probability, Eqn. (8) corresponds transformed luminance-chrominance color channels still re-
to denoisingpthe image xk+1 by a Gaussian denoiser with main some correlation [42] and it is preferable to jointly
noise level λ/µ. As a consequence, any Gaussian de- handle RGB channels. Consequently, instead of utilizing
noisers can be acted as a modular part to solve Eqn. (2). To the hand-designed pipeline, using discriminative learning
address this, we rewrite Eqn. (8) by following methods to automatically reveal the underlying color image
p prior would be a good alternative.
zk+1 = Denoiser(xk+1 , λ/µ) (9)
By considering the speed, performance and discrimina-
It is worth noting that, according to Eqns. (8) and (9), tive color image prior modeling, we choose deep CNN to
the image prior Φ(·) can be implicitly replaced by a denois- learn the discriminative denoisers. The reasons of using C-
er prior. Such a promising property actually offers several NN are four-fold. First, the inference of CNN is very effi-
advantages. First, it enables to use any gray or color denois- cient due to the parallel computation ability of GPU. Sec-
ers to solve a variety of inverse problems. Second, the ex- ond, CNN exhibits powerful prior modeling capacity with
plicit image prior Φ(·) can be unknown in solving Eqn. (2). deep architecture. Third, CNN exploits the external prior
Third, several complementary denoisers which exploit d- which is complementary to the internal prior of many ex-
ifferent image priors can be jointly utilized to solve one isting denoisers such as BM3D. In other words, a combina-
specific problem. Note that this property can be also em- tion with BM3D is expected to improve the performance.
ployed in other optimization methods (e.g., iterative shrink- Fourth, great progress in training and designing CNN have
age/thresholding algorithms ISTA [4, 14] and FISTA [3]) as been made during the past few years and we can take advan-
long as there involves a denoising subproblem. tage of those progress to facilitate discriminative learning.
3931
BNorm+ ReLU
BNorm+ ReLU
BNorm+ ReLU
BNorm+ ReLU
BNorm+ ReLU
ReLU
1-DConv 2-DConv 3-DConv 4-DConv 3-DConv 2-DConv 1-DConv
Figure 1. The architecture of the proposed denoiser network. Note that “s-DConv” denotes s-dilated convolution [63], here s = 1, 2, 3 and
4; “BNorm” represents batch normalization [32]; “ReLU” is the rectified linear units (max(·, 0)).
3.2. The Proposed CNN Denoiser design over the above two cases, we have trained three dif-
ferent models on noise level 25 with same training settings.
The architecture of the proposed CNN denoiser is illus-
It turns out that our designed model can have an average P-
trated in Figure 1. It consists of seven layers with three
SNR of 29.15dB on BSD68 dataset [50], which is much bet-
different blocks, i.e., “Dilated Convolution+ReLU” block
ter than 28.94dB of 7 layers network with traditional 3×3
in the first layer, five “Dilated Convolution+Batch Normal-
convolution filter and very close to 29.20dB of 16 layers
ization+ReLU” blocks in the middle layers, and “Dilated
network.
Convolution” block in the last layer. The dilation factors of
(3×3) dilated convolutions from first layer to the last layer Using Batch Normalization and Residual Learning to
are set to 1, 2, 3, 4, 3, 2 and 1, respectively. The number Accelerate Training. While advanced gradient optimiza-
of feature maps in each middle layer is set to 64. In the fol- tion algorithms can accelerate training and improve the per-
lowing, we will give some important details in our network formance, the architecture design is also an important fac-
design and training. tor. Batch normalization and residual learning which are
two of the most influential architecture design techniques
Using Dilated Filter to Enlarge Receptive Field. It has
have been widely adopted in recent CNN architecture de-
been widely acknowledged that the context information fa-
signs. In particular, it has been pointed out that the combi-
cilitates the reconstruction of the corrupted pixel in image
nation of batch normalization and residual learning is par-
denoising. In CNN, to capture the context information, it
ticularly helpful for Gaussian denoising since they are ben-
successively enlarges the receptive field through the for-
eficial to each other. To be specific, it not only enables fast
ward convolution operations. Generally, there are two basic
and stable training but also tends to result in better denois-
ways to enlarge the receptive field of CNN, i.e., increasing
ing performance [65]. In this paper, such strategy is adopted
the filter size and increasing the depth. However, increasing
and we empirically find it also can enable fast transfer from
the filter size would not only introduce more parameters but
one model to another with different noise level.
also increase the computational burden [53]. Thus, using
3×3 filter with a large depth is popularized in existing CN- Using Training Samples with Small Size to Help Avoid
N network design [30, 35, 56]. In this paper, we instead use Boundary Artifacts. Due to the characteristic of convolu-
the recent proposed dilated convolution to make a tradeoff tion, the denoised image of CNN may introduce annoying
between the size of receptive filed and network depth. Di- boundary artifacts without proper handling. There are two
lated convolution is known for its expansion capacity of the common ways to tackle with this, i.e., symmetrical padding
receptive field while keeping the merits of traditional 3×3 and zero padding. We adopt the zero padding strategy and
convolution. A dilated filter with dilation factor s can be wish the designed CNN has the capacity to model image
simply interpreted as a sparse filter of size (2s+1)×(2s+1) boundary. Note that the dilated convolution with dilation
where only 9 entries of fixed positions can be non-zeros. factor 4 in the fourth layer pads 4 zeros in the boundaries
Hence, the equivalent receptive field of each layer is 3, 5, 7, of each feature map. We empirically find that using training
9, 7, 5 and 3. Consequently, it can be easily obtained that samples with small size can help avoid boundary artifacts.
the receptive filed of the proposed network is 33×33. If the The main reason lies in the fact that, rather than using train-
traditional 3×3 convolution filter is used, the network will ing patches of large size, cropping them into small patches
either have a receptive filed of size 15×15 with the same can enable CNN to see more boundary information. For ex-
network depth (i.e., 7) or have a depth of 16 with the same ample, by cropping an image patch of size 70×70 into four
receptive filed (i.e., 33×33). To show the advantage of our small non-overlap patches of size 35×35, the boundary in-
3932
formation would be largely augmented. We also have tested Table 1. The average PSNR(dB) results of different methods on
(gray) BSD68 dataset.
the performance by using patches of large size, we empiri-
cally find this does not improve the performance. However, Methods BM3D WNNM TNRD MLP Proposed
if the size of the training patch is smaller than the receptive σ = 15 31.07 31.37 31.42 - 31.63
field, the performance would decrease. σ = 25 28.57 28.83 28.92 28.96 29.15
σ = 50 25.62 25.87 25.97 26.03 26.19
Learning Specific Denoiser Model with Small Interval
Noise Levels. Since the iterative optimization framework Table 2. The average PSNR(dB) results of CBM3D and proposed
requires various denoiser models with different noise level- CNN denoiser on (color) BSD68 dataset.
s, a practical issue on how to train the discriminative mod- Noise Level 5 15 25 35 50
els thus should be taken into consideration. Various studies CBM3D 40.24 33.52 30.71 28.89 27.38
have shown that if the exact solutions of subproblems (i.e., Proposed 40.36 33.86 31.16 29.50 27.86
Eqn. (6a) and Eqn. (6b)) are difficult or time-consuming to
optimize, then using an inexact but fast subproblem solu-
tion may accelerate the convergence [39, 66]. In this re- strategy is adopted, we use the following loss function,
spect, their is no need to learn many discriminative denois- N
er models for each noise level. On the other hand, although 1 X
ℓ(Θ) = kf (yi ; Θ) − (yi − xi )k2F (10)
Eqn. (9) is a denoiser, it has a different goal from the tradi- 2N i=1
tional Gaussian denoising. The goal of traditional Gaussian
denoising is to recover the latent clean image, however, the where {(yi , xi )}Ni=1 represents N noisy-clean patch pairs.
denoiser here just acts its own role regardless of the noise To optimize the network parameters Θ, the Adam
type and noise level of the image to be denoised. There- solver [36] is adopted. The step size is started from 1e−3
fore, the ideal discriminative denoiser in Eqn. (9) should be and then fixed to 1e−4 when the training error stops de-
trained by current noise level. As a result, there is tradeoff creasing. The training was terminated if the training error
to set the number of denoisers. In this paper, we trained a was fixed in five sequential epochs. For the other hyper-
set of denoisers on noise level range [0, 50] and divided it by parameters of Adam, we use their default setting. The mini-
a step size of 2 for each model, resulting in a set of 25 de- batch size is set to 256. Rotation or/and flip based data aug-
noisers for each gray and color image prior modelling. Due mentation is used during mini-batch learning. The denoiser
to the iterative scheme, it turns out the noise level range of models are trained in Matlab (R2015b) environment with
[0, 50] is enough to handle various image restoration prob- MatConvNet package [60] and an Nvidia Titan X GPU. To
lems. Especially noteworthy is the number of the denoisers reduce the whole training time, once a model is obtained,
which is much less than that of learning different models for we initialize the adjacent denoiser with this model. It takes
different degradations. about three days to train the set of denoiser models.
We compared the proposed denioser with several state-
of-the-art denoising methods, including two model-based
4. Experiments optimization methods (i.e., BM3D [17] and WNNM [29]),
two discriminative learning methods (i.e., MLP [8] and T-
4.1. Image Denoising NRD [13]). The gray image denoising results of different
methods on BSD68 dataset are shown in Table 1. It can be
It is widely acknowledged that convolutional neural net- seen that WNNM, MLP and TNRD can outperform BM3D
works generally benefit from the availability of large train- by about 0.3dB in PSNR. However, the proposed CNN de-
ing data. Hence, instead of training on a small dataset con- noiser can have a PSNR gain of about 0.2dB over those
sisting of 400 Berkeley segmentation dataset (BSD) images three methods. Table 2 shows the color image denoising
of size 180×180 [13], we collect a large dataset which in- results of benchmark CBM3D and our proposed CNN de-
cludes 400 BSD images, 400 selected images from valida- noiser, it can be seen that the proposed denoiser consistently
tion set of ImageNet database [20] and 4,744 images of Wa- outperforms CBM3D by a large margin. Such a promising
terloo Exploration Database [40]. We empirically find using result can be attributed to the powerful color image prior
large dataset does not improve the PSNR results of BSD68 modeling capacity of CNN.
dataset [50] but can slightly improve the performance of For the run time, we compared with BM3D and TNRD
other testing images. We crop the images into small patches due to their potential value in practical applications. Since
of size 35×35 and select N =256×4,000 patches for train- the proposed denoiser and TNRD support parallel compu-
ing. As for the generation of corresponding noisy patches, tation on GPU, we also give the GPU run time. To make a
we achieve this by adding additive Gaussian noise to the further comparison with TNRD under similar PSNR perfor-
clean patches during training. Since the residual learning mance, we additionally provide the run time of the proposed
3933
denoiser where each middle layer has 24 feature maps. We s, we choose one discriminative method named MLP [52]
use the Nvidia cuDNN-v5 deep learning library to acceler- and three model based optimization methods, including ID-
ate the GPU computation and the memory transfer time be- DBM3D [19], NCSR [22] and EPLL. Among the testing
tween CPU and GPU is not considered. Table 3 shows the images, apart from three classical gray images as shown
run times of different methods for denoising images of size in Figure 2, three color images are also included such that
256×256, 512×512 and 1024×1024 with noise level 25. we can test the performance of learned color denoiser prior.
We can see that the proposed denoiser is very competitive In the meanwhile, we note that the above methods are de-
in both CPU and GPU implementation. It is worth empha- signed for gray image deblurring. Specially, NCSR tackles
sizing that the proposed denoiser with 24 feature maps of the color input by first transforming it into YCbCr space and
each layer has a comparable PSNR of 28.94dB to TNRD but then conducting the main algorithm in the luminance com-
delivers a faster speed. Such a good compromise between ponent. In the following experiments, we simply plug the
speed and performance over TNRD is properly attributed to color denoisers into the HQS framework, whereas we sep-
the following three reasons. First, the adopted 3×3 convo- arately handle each color channel for IDDBM3D and MLP.
lution and ReLU nonlinearity are simple yet effective and Note that MLP trained a specific model for the Gaussian
efficient. Second, in contrast to the stage-wise architecture blur kernel with noise level 2.
of TNRD which essentially has a bottleneck in each imme-
Once the denoisers are provided, the subsequent crucial
diate output layer, ours encourages a fluent information flow
issue would be parameter setting. From Eqns. (6), we can
among different layers, thus having larger model capacity.
note that there involve two parameters, λ and µ, to tune.
Third, batch normalization which is beneficial to Gaussian
Generally, for a certain degradation, λ is correlated with
denoising is adopted. According to the above discussion-
σ 2 and keeps fixed during iterations, while µ controls noise
s, we can conclude that the proposed denoiser is a strong
level of denoiser. Since the HQS framework is denoiser-
competitor against BM3D and TNRD.
based, we instead set the noise level of denoiser in each
Table 3. Run time (in seconds) of different methods on images of iteration to implicitly
p determine µ. Note that the noise level
size 256×256, 512×512 and 1024×1024 with noise level 25. of denoiser λ/µ should be set from large to small. In
our experimental settings, it is decayed exponentially from
Size Device BM3D TNRD Proposed24 Proposed64
49 to a value in [1, 15] depending on the noise level. The
CPU 0.66 0.47 0.10 0.310
256×256
GPU - 0.010 0.006 0.012 number of iterations is set to 30 as we find it is large enough
CPU 2.91 1.33 0.39 1.24 to obtain a satisfying performance.
512×512
GPU - 0.032 0.016 0.038
CPU 11.89 4.61 1.60 4.65 The PSNR results of different methods are shown in Ta-
1024×1024
GPU - 0.116 0.059 0.146 ble 4. As one can see, the proposed CNN denoiser prior
based optimization method achieves very promising PSNR
results. Figure 3 illustrates deblurred Leaves image by d-
4.2. Image Deblurring ifferent methods. We can see that IDDBM3D, NCSR and
As a common setting, the blurry images are synthesized MLP tend to smooth the edges and generate color artifacts.
by first applying a blur kernel and then adding additive In contrast, the proposed method can recover image sharp-
Gaussian noise with noise level σ. In addition, we assume ness and naturalness.
the convolution is carried out with circular boundary con-
ditions. Thus, an efficient implementation of Eqn. (7) by
using Fast Fourier Transform (FFT) can be employed. To Table 4. Deblurring results of different methods.
make a thorough evaluation, we consider three blur kernels, Methods σ C.man House Lena Monar. Leaves Parrots
Gaussian blur with standard deviation 1.6
including a commonly-used Gaussian kernel with standard
IDDBM3D 27.08 32.41 30.28 27.02 26.95 30.15
deviation 1.6 and the first two of the eight real blur kernels NCSR 27.99 33.38 30.99 28.32 27.50 30.42
2
from [38]. As shown in Table 4, we also consider Gaussian MLP 27.84 33.43 31.10 28.87 28.91 31.24
noise with different noise levels. For the compared method- Proposed 28.12 33.80 31.17 30.00 29.78 32.07
Kernel 1 (19×19) [38]
EPLL 29.43 31.48 31.68 28.75 27.34 30.89
2.55
Proposed 32.07 35.17 33.88 33.62 33.92 35.49
EPLL 25.33 28.19 27.37 22.67 21.67 26.08
7.65
Proposed 28.11 32.03 29.51 29.20 29.07 31.63
Kernel 2 (17×17) [38]
EPLL 29.67 32.26 31.00 27.53 26.75 30.44
(a) (b) (c) (d) (e) (f) 2.55
Proposed 31.69 35.04 33.53 33.13 33.51 35.17
Figure 2. Six testing images for image deblurring. (a) Cameraman; EPLL 24.85 28.08 27.03 21.60 21.09 25.77
7.65
(b) House; (c) Lena; (d) Monarch; (e) Leaves; (f) Parrots. Proposed 27.70 31.94 29.27 28.73 28.63 31.35
3934
(a) Blurry and noisy image (b) IDDBM3D (26.95dB) (c) NCSR (27.50dB) (d) MLP (28.91dB) (e) Proposed (29.78dB)
Figure 3. Image deblurring performance comparison for Leaves image (the blur kernel is Gaussian kernel with standard deviation 1.6, the
noise level σ is 2).
4.3. Single Image Super-Resolution cluding two CNN-based discriminative learning method-
s (i.e., SRCNN [21] and VDSR [35]), one statistical pre-
In general, the low-resolution (LR) image can be mod-
diction model based discriminative learning method [45]
eled by a blurring and subsequent down-sampling opera-
which we refer to as SPMSR, one model based optimiza-
tion on a high-resolution one. The existing super-resolution
tion method (i.e., NCSR [22]) and one denoiser prior based
models, however, mainly focus on modeling image prior
method (i.e., SRBM3D [24]). Except for SRBM3D, all
and are trained for specific degradation process. This makes
the existing methods conducted their main algorithms on Y
the learned model deteriorates seriously when the blur ker-
channel (i.e., luminance) of transformed YCbCr space. In
nel adopted in training deviates from the real one [23, 64].
order to evaluate the proposed color denoiser prior, we also
Instead, our model can handle any blur kernels without re-
conduct experiments on the original RGB channels and thus
training. Thus, in order to thoroughly evaluate the flexibil-
the PSNR results of super-resolved RGB images of differ-
ity of the CNN denoiser prior based optimization method
ent methods are also given. Since the source code of SRB-
as well as the effectiveness of the CNN denoisers, follow-
M3D is not available, we also compare two methods which
ing [45], this paper considers three typical image degrada-
replace the proposed CNN denoiser with BM3D/CBM3D
tion settings for SISR, i.e., bicubic downsampling (default
denoiser. Those two methods are denoted by SRBM3DG
setting of Matlab function imresize) with two scale fac-
and SRBM3DC , respectively.
tors 2 and 3 [15, 21] and blurring by Gaussian kernel of size
7×7 with standard deviation 1.6 followed by downsampling Table 5 shows the average PSNR(dB) results of differ-
with scale factor 3 [22, 45]. ent methods for SISR on Set5 and Set14 [59]. Note that
Inspired by the method proposed in [24] which iterative- SRCNN and VDSR are trained with bicubic blur kernel,
ly updates a back-projection [33] step and a denoising step thus it is unfair to use their models to super-resolve the
for SISR, we use the following back-projection iteration to low-resolution image with Gaussian kernel. As a matter of
solve Eqn. (6a), fact, we give their performances to demonstrate the limi-
tations of such discriminative learning methods. From Ta-
xk+1 = xk − α(y − xk ↓sf ) ↑sfbicubic (11) ble 5, we can have several observations. First, although S-
RCNN and VDSR achieve promising results to tackle the
where ↓sf denotes the degradation operator with downscal- case with bicubic kernel, their performance deteriorates se-
ing factor sf, ↑sfbicubic represents bicubic interpolation opera- riously when the low-resolution image are not generated by
tor with upscaling factor sf, and α is the step size. It is wor- bicubic kernel (see Figure 4). On the other hand, with the
thy noting that the iterative regularization step of methods accurate blur kernel, even NCSR and SPMSR outperfor-
such as NCSR and WNNM actually corresponds to solv- m SRCNN and VDSR for Gaussian blur kernel. In con-
ing Eqn. (6a). From this viewpoint, those methods are opti- trast, the proposed methods (denoted by ProposedG and
mized under HQS framework. Here, note that only the bicu- ProposedC ) can handle all the cases well. Second, the pro-
bic downsampling is considered in [24], whereas Eqn. (11) posed methods have a better PSNR result than SRBM3DC
is extended to deal with different blur kernels. To obtain a and SRBM3DG which indicates good denoiser prior facil-
fast convergence, we repeat Eqn. (11) five times before ap- itates to solve super-resolution problem. Third, both of
plying the denoising step. The number of main iterations is the gray and color CNN denoiser prior based optimization
set to 30, the step size α is fixed to 1.75 and the noise levels methods can produce promising results. As an example for
of denoiser are decayed exponentially from 12×sf to sf. the testing speed comparison, our method can super-resolve
The proposed deep CNN denoiser prior based SISR the Butterfly image in 0.5 second on GPU and 12 seconds
method is compared with five state-of-the-art methods, in- on CPU, whereas NCSR spends 198 seconds on CPU.
3935
Table 5. Average PSNR(dB) results of different methods for single image super-resolution on Set5 and Set14.
Dataset Scale Kernel Channel SRCNN VDSR NCSR SPMSR SRBM3D SRBM3DG SRBM3DC ProposedG ProposedC
Y 36.65 37.56 - 36.11 37.10 36.34 36.25 37.43 37.22
2 Bicubic
RGB 34.45 35.16 - 33.94 - 34.11 34.22 35.05 35.07
Y 32.75 33.67 - 32.31 33.30 32.62 32.54 33.39 33.18
Set5 3 Bicubic
RGB 30.72 31.50 - 30.32 - 30.57 30.69 31.26 31.25
Y 30.42 30.54 33.02 32.27 - 32.66 32.59 33.38 33.17
3 Gaussian
RGB 28.50 28.62 30.00 30.02 - 30.31 30.74 30.92 31.21
Y 32.43 33.02 - 31.96 32.80 32.09 32.25 32.88 32.79
2 Bicubic
RGB 30.43 30.90 - 30.05 - 30.15 30.32 30.79 30.78
Y 29.27 29.77 - 28.93 29.60 29.11 29.27 29.61 29.50
Set14 3 Bicubic
RGB 27.44 27.85 - 27.17 - 27.32 27.47 27.72 27.67
Y 27.71 27.80 29.26 28.89 - 29.18 29.39 29.63 29.55
3 Gaussian
RGB 26.02 26.11 26.98 27.01 - 27.24 27.60 27.59 27.70
(a) Ground-truth (b) Zoomed LR image (c) SRCNN (24.46dB) (d) VDSR (24.73dB) (e) ProposedG (29.32dB)
Figure 4. Single image super-resolution performance comparison for Butterfly image from Set5 (the blur kernel is 7×7 Gaussian kernel
with standard deviation 1.6, the scale factor is 3). Note that the comparison with SRCNN and VDSR is unfair. The proposed deep CNN
denoiser prior based optimization method can super-resolve the LR image by tuning the blur kernel and scale factor without training,
whereas SRCNN and VDSR need additional training to deal with such cases. As a result, this figure is mainly used to show the flexibility
advantage of the proposed deep CNN denoiser prior based optimization method over discriminative learning methods.
3936
References age denoising by sparse 3-D transform-domain collabora-
tive filtering. IEEE Transactions on Image Processing,
[1] H. C. Andrews and B. R. Hunt. Digital image restoration. 16(8):2080–2095, 2007. 2, 3, 5
Prentice-Hall Signal Processing Series, Englewood Cliffs: [18] A. Danielyan, V. Katkovnik, and K. Egiazarian. Image de-
Prentice-Hall, 1977, 1, 1977. 1 blurring by augmented lagrangian with BM3D frame prior.
[2] A. Barbu. Training an active random field for real-time im- In Workshop on Information Theoretic Methods in Science
age denoising. IEEE Transactions on Image Processing, and Engineering, pages 16–18, 2010. 2
18(11):2451–2462, 2009. 1 [19] A. Danielyan, V. Katkovnik, and K. Egiazarian. BM3D
[3] A. Beck and M. Teboulle. A fast iterative shrinkage- frames and variational image deblurring. IEEE Transactions
thresholding algorithm for linear inverse problems. SIAM on Image Processing, 21(4):1715–1728, 2012. 2, 6
journal on imaging sciences, 2(1):183–202, 2009. 3 [20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
[4] J. M. Bioucas-Dias and M. A. Figueiredo. A new TwIST: Fei. Imagenet: A large-scale hierarchical image database. In
Two-step iterative shrinkage/thresholding algorithms for im- IEEE Conference on Computer Vision and Pattern Recogni-
age restoration. IEEE Transactions on Image Processing, tion, pages 248–255, 2009. 5
16(12):2992–3004, 2007. 3 [21] C. Dong, C. C. Loy, K. He, and X. Tang. Image
[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Dis- super-resolution using deep convolutional networks. IEEE
tributed optimization and statistical learning via the alternat- transactions on Pattern Analysis and Machine Intelligence,
ing direction method of multipliers. Foundations and Trends 38(2):295–307, 2016. 2, 7
in Machine Learning, 3(1):1–122, 2011. 2 [22] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally central-
[6] A. Brifman, Y. Romano, and M. Elad. Turning a denoiser ized sparse representation for image restoration. IEEE Trans-
into a super-resolver using plug and play priors. In IEEE actions on Image Processing, 22(4):1620–1630, 2013. 2, 6,
International Conference on Image Processing, pages 1404– 7
1408, 2016. 2 [23] N. Efrat, D. Glasner, A. Apartsin, B. Nadler, and A. Levin.
[7] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm Accurate blur models vs. image priors in single image super-
for image denoising. In IEEE Conference on Computer Vi- resolution. In IEEE International Conference on Computer
sion and Pattern Recognition, volume 2, pages 60–65, 2005. Vision, pages 2832–2839, 2013. 7
3 [24] K. Egiazarian and V. Katkovnik. Single image super-
[8] H. C. Burger, C. J. Schuler, and S. Harmeling. Image de- resolution via BM3D sparse coding. In European Signal
noising: Can plain neural networks compete with BM3D? In Processing Conference, pages 2849–2853, 2015. 2, 7
IEEE Conference on Computer Vision and Pattern Recogni- [25] M. Elad and M. Aharon. Image denoising via sparse and
tion, pages 2392–2399, 2012. 2, 5 redundant representations over learned dictionaries. IEEE
[9] P. Campisi and K. Egiazarian. Blind image deconvolution: Transactions on Image processing, 15(12):3736–3745, 2006.
theory and applications. CRC press, 2016. 1 3
[10] A. Chambolle. An algorithm for total variation minimiza- [26] A. Foi, V. Katkovnik, and K. Egiazarian. Pointwise
tion and applications. Journal of Mathematical imaging and shape adaptive DCT denoising with structure preservation
vision, 20(1-2):89–97, 2004. 3 in luminance-chrominance space. In International Workshop
[11] A. Chambolle and T. Pock. A first-order primal-dual al-
on Video Processing and Quality Metrics for Consumer Elec-
gorithm for convex problems with applications to imaging.
tronics, 2006. 3
Journal of Mathematical Imaging and Vision, 40(1):120– [27] Q. Gao and S. Roth. How well do filter-based MRFs model
145, 2011. 2 natural images? In Joint DAGM (German Association for
[12] S. H. Chan, X. Wang, and O. A. Elgendy. Plug-and-Play
Pattern Recognition) and OAGM Symposium, pages 62–72,
ADMM for image restoration: Fixed-point convergence and
2012. 2
applications. IEEE Transactions on Computational Imaging, [28] D. Geman and C. Yang. Nonlinear image recovery with half-
3(1):84–98, 2017. 2 quadratic regularization. IEEE Transactions on Image Pro-
[13] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion:
cessing, 4(7):932–946, 1995. 2
A flexible framework for fast and effective image restoration. [29] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear
IEEE transactions on Pattern Analysis and Machine Intelli- norm minimization with application to image denoising. In
gence, 2016. 1, 5 IEEE Conference on Computer Vision and Pattern Recogni-
[14] P. L. Combettes and V. R. Wajs. Signal recovery by proximal
tion, pages 2862–2869, 2014. 2, 5
forward-backward splitting. Multiscale Modeling & Simula- [30] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
tion, 4(4):1168–1200, 2005. 3 for image recognition. In IEEE Conference on Computer
[15] Z. Cui, H. Chang, S. Shan, B. Zhong, and X. Chen. Deep
Vision and Pattern Recognition, pages 770–778, 2016. 4
network cascade for image super-resolution. In European [31] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak,
Conference on Computer Vision, pages 49–64, 2014. 7 D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian,
[16] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color
et al. Flexisp: A flexible camera image processing frame-
image denoising via sparse 3D collaborative filtering with
work. ACM Transactions on Graphics, 33(6):231, 2014. 2
grouping constraint in luminance-chrominance space. In [32] S. Ioffe and C. Szegedy. Batch normalization: Accelerating
IEEE International Conference on Image Processing, vol- deep network training by reducing internal covariate shift. In
ume 1, pages I–313, 2007. 3 International Conference on Machine Learning, pages 448–
[17] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Im-
3937
456, 2015. 2, 4 [51] U. Schmidt and S. Roth. Shrinkage fields for effective image
[33] M. Irani and S. Peleg. Motion analysis for image enhance- restoration. In IEEE Conference on Computer Vision and
ment: Resolution, occlusion, and transparency. Journal of Pattern Recognition, pages 2774–2781, 2014. 1
Visual Communication and Image Representation, 4(4):324– [52] C. J. Schuler, H. Christopher Burger, S. Harmeling, and
335, 1993. 7 B. Scholkopf. A machine learning approach for non-blind
[34] V. Jain and S. Seung. Natural image denoising with convo- image deconvolution. In IEEE Conference on Computer Vi-
lutional networks. In Advances in Neural Information Pro- sion and Pattern Recognition, pages 1067–1074, 2013. 6
cessing Systems, pages 769–776, 2009. 2 [53] K. Simonyan and A. Zisserman. Very deep convolutional
[35] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super- networks for large-scale image recognition. In International
resolution using very deep convolutional networks. In IEEE Conference for Learning Representations, 2015. 4
Conference on Computer Vision and Pattern Recognition, [54] S. Sreehari, S. Venkatakrishnan, B. Wohlberg, L. F. Drum-
pages 1646–1654, 2016. 4, 7 my, J. P. Simmons, and C. A. Bouman. Plug-and-play priors
[36] D. Kingma and J. Ba. Adam: A method for stochastic op- for bright field electron tomography and sparse interpolation.
timization. In International Conference for Learning Repre- arXiv preprint arXiv:1512.07331, 2015. 2
sentations, 2015. 2, 5 [55] J. Sun and M. F. Tappen. Separable markov random field
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet model and its applications in low level vision. IEEE Trans-
classification with deep convolutional neural networks. In actions on Image Processing, 22(1):402–407, 2013. 1
Advances in Neural Information Processing Systems, pages [56] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
1097–1105, 2012. 2 D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
[38] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Under- Going deeper with convolutions. In IEEE Conference on
standing and evaluating blind deconvolution algorithms. In Computer Vision and Pattern Recognition, June 2015. 4
IEEE Conference on Computer Vision and Pattern Recogni- [57] M. F. Tappen. Utilizing variational optimization to learn
tion, pages 1964–1971, 2009. 6 markov random fields. In IEEE Conference on Computer
[39] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange mul- Vision and Pattern Recognition, pages 1–8, 2007. 1
tiplier method for exact recovery of corrupted low-rank ma- [58] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. Figueiredo.
trices. arXiv preprint arXiv:1009.5055, 2010. 5 Image restoration and reconstruction using variable splitting
[40] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and and class-adapted image priors. In IEEE International Con-
L. Zhang. Waterloo exploration database: New challenges ference on Image Processing, pages 3518–3522, 2016. 2
for image quality assessment models. IEEE Transactions on [59] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted
Image Processing, 26(2):1004–1016, 2017. 5 anchored neighborhood regression for fast super-resolution.
[41] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for In Asian Conference on Computer Vision, pages 111–126,
color image restoration. IEEE Transactions on Image Pro- 2014. 7
cessing, 17(1):53–69, 2008. 3 [60] A. Vedaldi and K. Lenc. MatConvNet: Convolutional neu-
[42] T. Miyata. Inter-channel relation based vectorial total varia- ral networks for matlab. In ACM Conference on Multimedia
tion for color image recovery. In IEEE International Confer- Conference, pages 689–692, 2015. 5
ence on Image Processing,, pages 2251–2255, 2015. 3 [61] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg.
[43] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An it- Plug-and-play priors for model based reconstruction. In
erative regularization method for total variation-based image IEEE Global Conference on Signal and Information Pro-
restoration. Multiscale Modeling & Simulation, 4(2):460– cessing, pages 945–948, 2013. 2
489, 2005. 3 [62] L. Xu, J. S. Ren, C. Liu, and J. Jia. Deep convolutional neural
[44] N. Parikh, S. P. Boyd, et al. Proximal algorithms. Founda- network for image deconvolution. In Advances in Neural
tions and Trends in optimization, 1(3):127–239, 2014. 2 Information Processing Systems, pages 1790–1798, 2014. 2
[45] T. Peleg and M. Elad. A statistical prediction model based [63] F. Yu and V. Koltun. Multi-scale context aggregation by di-
on sparse representations for single image super-resolution. lated convolutions. arXiv preprint arXiv:1511.07122, 2015.
IEEE Transactions on Image Processing, 23(6):2569–2582, 2, 4
2014. 7 [64] K. Zhang, X. Zhou, H. Zhang, and W. Zuo. Revisiting sin-
[46] A. Rajwade, A. Rangarajan, and A. Banerjee. Image de- gle image super-resolution under internet environment: blur
noising using the higher order singular value decomposition. kernels and reconstruction algorithms. In Pacific Rim Con-
IEEE Transactions on Pattern Analysis and Machine Intelli- ference on Multimedia, pages 677–687, 2015. 7
gence, 35(4):849–862, 2013. 3 [65] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Be-
[47] W. H. Richardson. Bayesian-based iterative method of image yond a Gaussian denoiser: Residual learning of deep CNN
restoration. JOSA, 62(1):55–59, 1972. 1 for image denoising. IEEE Transactions on Image Process-
[48] Y. Romano, M. Elad, and P. Milanfar. The little engine that ing, 2017. 4
could regularization by denoising (RED). arXiv preprint [66] D. Zoran and Y. Weiss. From learning models of natural im-
arXiv:1611.02862, 2016. 2 age patches to whole image restoration. In IEEE Internation-
[49] A. Rond, R. Giryes, and M. Elad. Poisson inverse problems al Conference on Computer Vision, pages 479–486, 2011. 1,
by the plug-and-play scheme. Journal of Visual Communi- 2, 3, 5
cation and Image Representation, 41:96–108, 2016. 2
[50] S. Roth and M. J. Black. Fields of experts. International
Journal of Computer Vision, 82(2):205–229, 2009. 1, 4, 5
3938