ICMEW.2019.00-79
ICMEW.2019.00-79
ICMEW.2019.00-79
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
qikang1994@gmail.com, fuying@bit.edu.cn, huahuang@bit.edu.cn
Inspired by [24], we introduce a lightweight channel at- sampling block, and Fk ∈ R 2k × 2k ×2 C . As shown in
tention module for color image demosaicking, which can ob- Fig. 3(a), deep downsampling block includes downsampling
tain discriminative feature among channels for variant scenes. convolutional layer with ReLU activation function, “Conv +
Given an intermediate feature map T ∈ Rh×w×c as input, the ReLU” repeated D times, and a CAM. For the downsampling
channel attention module can infer an one-dimensional vec- convolutional layer, we use 2×2 convolutional operation with
tor α = [α1 , α2 , . . . , αc ], which is used to rescale the input stride 2 for downsampling and doubling the number of feature
feature T . channels.
As a result, the feature map F3 is used as input for the next
As illustrated in Fig. 1, a global average pooling opera-
part, which is composed of multiple unsampling and feature
tion among spatial dimension h × w is first adopted to obtain
fusion blocks. This process can be represented by
global spatial context information. It is followed by a dimen-
sion reduction layer with reduction ratio r, a rectified linear
HU P,k (Fk ), k = 3,
unit (ReLU) activation function [25]. Then, the output feature Pk−1 =
map is inputted to a dimension increasing layer and a sigmoid HU P,k (Qk ), k = 2, 1. (4)
activation function to generate the factor α which is used to Qk−1 = HF F,k (Concat(Fk−1 , Pk−1 )),
247
Shallow feature extraction block
Deep downsampling block
Upscaling module 64 64 64 64
Feature fusion module
Reconstruction block
128 128 128 128
F3
F2 P2 F2 Q2
X
I F1 P1 F1 Q1 R
F0 P0 F0 Q0
Fig. 2: Architecture of the proposed network.
H W k−1
where {Pk−1 , Qk−1 } ∈ R 2k−1 × 2k−1 ×2 C , Concat(·, ·) 2.3. Loss Function
refers to the concatenation operation, HU P,k (·) and HF F,k (·) Let Θ be the set of network parameters. Given the training
N
denote the function of k-th upscaling module and feature fu- dataset {(Ii , Yi )}i=1 , where Ii is the i-th mosaicked image
sion module. For the feature fusion module, as shown in and Yi is the corresponding ground truth image, and N is the
Fig. 3(b), we first use a 1 × 1 convolution layer to fuse the number of the images in the training data. Our goal is to learn
shallow-level feature and the deep-level feature from the up- a mapping function f , by optimizing the parameters of the
scaling module, and halve the number of feature channels. network. To make demosaicked image Ri = f (Ii ; Θ) close
Then, “Conv + ReLU” are used D − 1 times for further deep to the ground truth Yi , mean absolute error (MAE) is used as
feature extraction. the loss function and can be formulated as
Finally, Q0 is the input of the reconstruction block and the 1 N
L(Θ) = Yi − f (Ii ; Θ)1 . (6)
demosaicked image R can be obtained by N i=1
where R ∈ R2H×2W ×3 and HRB (·) denotes the reconstruc- 3.1. Datasets and implementation details
tion block. It consists of a 1 × 1 convolution layer with ReLU Training dataset. We use the WED dataset [27] to form the
activation function, and a sub-pixel convolution layer [26] training set like [18]. It has 4744 high-quality natural images,
without activation function. where 4644 images are used for training. We further divide
them into patch size 128 × 128 with an overlap of 32 pixels as
the ground truth. To generate the mosaic image patches, we
2x2 Conv, /2 downsample the ground truth based on Bayer pattern. Finally,
approximately 300000 patch pairs are generated for training.
ReLU 1x1 Conv Moreover, to further augment the data, we apply three ways,
3x3 Conv ReLU
including randomly rotate the image with the degree of 90◦ ,
D 180◦ , 270◦ , randomly flip images horizontally, and randomly
times
ReLU 3x3 Conv flip images vertically.
D-1
times
Testing datasets. The proposed method is evaluated on two
CAM ReLU widely used benchmark datasets, i.e. the Kodak and Mc-
Master datasets. The Kodak dataset consists of 24 images
(a) deep downsampling block (b) feature fusion module (768 × 512), and the McMaster dataset contains 18 images
(500 × 500) cropped from 8 high resolution natural images.
Fig. 3: Architecture of the deep downsampling block and fea- Also, we evaluate our method on WED-CDM dataset [18],
ture fusion module. “/2” denotes a convolutional layer with a which has 100 high quality full color images from the WED
stride of 2 to downsample the feature maps, “CAM” denotes dataset.
channel attention module. Implementation details. We set the parameters C, D in Sec-
tion 2.2 to be 64 and 4, respectively. Meanwhile, reduction
248
(a) GroundTruth (b) DLMMSE [8] (c) GBTF [4] (d) LDI-NAT [11] (e) RI [13] (f) MLRI [14] (g) ARI [15]
(h) RI new [16] (i) ARI new [17] (j) DJDD [21] (k) CDM-CNN [18] (l) 3-stage CNN [19] (m) Ours(w/o CAM) (n) Ours
Fig. 4: The comparison of demosaicking results on Kodak 19. PSNR:(b) 37.16dB. (c) 38.17dB. (d) 28.42dB. (e) 38.33dB. (f)
33.27dB. (g) 35.72dB. (h) 34.75dB. (i) 32.60dB. (j) 41.08dB. (k) 40.98dB. (l) 38.69dB. (m) 43.28dB. (n) 44.07dB.
ratio r is set to be 16 in all channel attention modules. The McMaster and WED-CDM datasets are shown in Table 1.
weights are initialized by using the method proposed in [28], It can be seen that the proposed method achieves the best
and the loss is minimized by using Adam [29]. The mini- PSNR/CPSNR results than the compared methods on all three
batch size is set to be 64. The learning rate decays from 10−4 datasets. Specifically, our method outperforms the second
to 10−6 , and the weight-decay is 1e−8 . The proposed work best method by 0.72dB, 0.11dB and 0.34dB on Kodak, Mc-
is implemented on PyTorch [30] and all the experiments are Master and WED-CDM datasets, respectively. Table 3 shows
executed on a workstation with Intel(R) Core i7-6700 CPU the average SSIM of different approaches on three datasets.
3.5GHz and an Nvidia GeForce GTX 1080Ti GPU. We can see that the proposed method, similar to the PSNR
evaluation, provides best result on all datasets. In addition, the
3.2. Experimental Results results of our method without CAM are also shown in Tables
1 and 3. We can see our method without CAM also outper-
Compared Methods. We compare our method with eleven forms the state-of-the-art methods in most cases, and CAM
state-of-the-art demosaicking methods, including eight tra- can effectively improve the performance of our method.
ditional methods (i.e., DLMMSE [8], GBTF [4], LDI-
NAT [11], RI [13], MLRI [14], ARI [15], RI new [16], Qualitative Results. Aside from using PSNR and SSIM for
ARI new [17]) and three CNN-based methods (i.e., evaluation, our method also shows the advantage in terms of
DJDD [21], CDM-CNN [18], 3-stage CNN [19]). It’s worth visual quality. It is well known that the strong edges and
mentioning that we retrain those CNN-based methods with textural areas are always the challenges for color image de-
the same dataset. Besides, we provide the experimental re- mosaicking and easily lead to conspicuous artifacts such as
sults for our method with and without CAM to show the effect zippering and color moire. As shown in Fig. 4, we take the
of CAM. fence in the 19-th image of Kodak dataset with strong verti-
Quantitative Results. To quantitatively evaluate the objective cal texture as an example to compare the visual quality be-
performance of color image demosaicking methods, we adopt tween our method and compared methods. It can be observed
the PSNR and the composite PSNR (CPSNR) to measure the that almost all of compared results have chromatic aliases and
quality of each color channel and all three channels, respec- zippering, while our proposed method reconstructs the image
tively. Besides, we use the structural similarity (SSIM) [31] more perfectly. These results are consistent with the quantita-
as the evaluation metric to take structural similarity and detail tive evaluation.
information into account. Running Time. To evaluate the time complexity of these
The PSNR/CPSNR results of different methods on Kodak, methods, we use the source codes of all compared methods
249
Table 1: Average PSNR and CPSNR results (in dB) on three datasets. The best results are highlighted in bold type.
Dataset Kodak Dataset McMaster Dataset WED-CDM Dataset average
Methods R G B RGB R G B RGB R G B RGB RGB
DLMMSE [8] 39.18 42.63 39.58 40.11 34.03 37.99 33.04 34.47 35.52 39.74 35.18 36.27 36.69
GBTF [4] 39.68 43.34 40.01 40.62 33.98 37.34 33.07 34.38 35.64 39.45 35.30 36.34 36.81
LDI-NAT [11] 36.99 39.44 37.12 37.69 36.28 39.76 34.39 36.20 36.18 40.03 35.64 36.78 36.86
RI [13] 37.82 41.00 37.80 38.56 36.07 39.99 35.35 36.47 36.50 40.54 36.11 37.17 37.32
MLRI [14] 38.87 41.83 38.86 39.58 36.35 39.90 35.36 36.62 36.94 40.70 36.43 37.53 37.76
ARI [15] 39.10 42.31 38.90 39.79 37.41 40.72 36.05 37.52 37.18 40.90 36.79 37.83 38.12
RI new [16] 38.62 41.18 38.48 39.21 36.72 40.23 35.59 36.91 37.05 40.69 36.57 37.63 37.81
ARI new [17] 39.27 42.43 39.10 39.95 37.45 40.68 36.21 37.60 37.22 40.92 36.76 37.82 38.15
DJDD [21] 41.67 45.66 41.31 42.38 39.34 42.19 37.53 39.17 39.34 43.32 38.86 39.99 40.29
CDM-CNN [18] 41.38 44.85 41.04 42.04 39.14 42.10 37.31 38.98 39.01 43.04 38.54 39.67 39.98
3-stage CNN [19] 41.97 45.10 41.02 42.31 39.52 42.53 37.65 39.34 39.43 43.40 38.97 40.09 40.37
Ours(w/o CAM) 42.24 46.09 41.99 42.98 39.55 42.47 37.54 39.29 39.68 43.65 39.09 40.27 40.60
Ours 42.35 46.10 42.00 43.03 39.74 42.59 37.72 39.45 39.81 43.75 39.28 40.43 40.75
Table 2: Average execution time(second) per image on CPU or GPU.
Methods DLMMSE GBTF LDI-NAT RI MLRI ARI RI new ARI new DJDD CDM-CNN 3-stage CNN Ours(w/o CAM) Ours
CPU 3.37 4.90 358.40 0.62 0.86 20.94 1.32 24.61 3.27 3.67 13.86 4.02 4.16
Time(s) GPU - - - - - - - - 0.003 0.005 0.016 0.004 0.004
Table 3: SSIM results for three datasets. The best results are
highlighted in bold type. ing. Specifically, the feature pyramid structure allows to
reach large network depth and obtain a large receptive field,
Dataset Kodak McM WED-CDM average
which takes full advantage of image self-similarity and redun-
DLMMSE [8] 0.9866 0.9645 0.9777 0.9775
dancy effectively at the expense of low computational cost.
GBTF [4] 0.9873 0.9637 0.9779 0.9777
Meanwhile, attention mechanism is incorporated into our net-
LDI-NAT [11] 0.9727 0.9690 0.9707 0.9708
work. It can reflect channel-wise quality of features for each
RI [13] 0.9824 0.9734 0.9798 0.9794
scene and concentrate on more useful channels. Extensive
MLRI [14] 0.9846 0.9729 0.9812 0.9807 benchmark evaluations on Kodak, McMaster and WED-CDM
ARI [15] 0.9833 0.9760 0.9816 0.9812 datasets have demonstrated that our proposed network was
RI new [16] 0.9835 0.9744 0.9816 0.9810 very effective compared with the state-of-the-art demosaick-
ARI new [17] 0.9840 0.9771 0.9821 0.9818 ing methods in terms of objective and subjective assessments,
DJDD [21] 0.9862 0.9793 0.9753 0.9776 and maintained competitive running time.
CDM-CNN [18] 0.9876 0.9795 0.9735 0.9766
3-stage CNN [19] 0.9888 0.9800 0.9861 0.9858 5. ACKNOWLEDGMENTS
ours(w/o CAM) 0.9894 0.9808 0.9870 0.9866 This work was supported by the National Natural Science
ours 0.9895 0.9814 0.9875 0.9870 Foundation of China (61425013 and 61672096), and Beijing
Natural Science Foundation (L172012).
250
[5] Julien Mairal, Michael Elad, and Guillermo Sapiro, “Sparse IEEE International Conference on Multimedia and Expo
representation for color image restoration,” IEEE Transactions (ICME), 2017, pp. 793–798.
on Image Processing, vol. 17, no. 1, pp. 53–69, Jan 2008. [19] Kai Cui, Zhi Jin, and Eckehard Steinbach, “Color image de-
[6] Xiaolin Wu, Dahua Gao, Guangming Shi, and Danhua Liu, mosaicking using a 3-stage convolutional neural network struc-
“Color demosaicking with sparse representations,” in IEEE ture,” in IEEE International Conference on Image Processing
International Conference on Image Processing (ICIP), Sept (ICIP), Oct 2018, pp. 2177–2181.
2010, pp. 1645–1648. [20] Daniel Stanley Tan, Wei-Yang Chen, and Kai-Lung Hua,
[7] Guoshen Yu, Guillermo Sapiro, and Stéphane Mallat, “Solv- “Deepdemosaicking: Adaptive image demosaicking via multi-
ing inverse problems with piecewise linear estimators: From ple deep fully convolutional networks,” IEEE Transactions on
gaussian mixture models to structured sparsity,” IEEE Trans- Image Processing, vol. 27, no. 5, pp. 2408–2419, May 2018.
actions on Image Processing, vol. 21, no. 5, pp. 2481–2499, [21] Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo
May 2012. Durand, “Deep joint demosaicking and denoising,” ACM
[8] Lei Zhang and Xiaolin Wu, “Color demosaicking via direc- Transactions on Graphics, vol. 35, no. 6, pp. 191, 2016.
tional linear minimum mean square-error estimation,” IEEE [22] Filippos Kokkinos and Stamatios Lefkimmiatis, “Deep im-
Transactions on Image Processing, vol. 14, no. 12, pp. 2167– age demosaicking using a cascade of convolutional residual
2178, Dec 2005. denoising networks,” in European Conference on Computer
[9] Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, Vision (ECCV), September 2018.
and Andrew Zisserman, “Non-local sparse models for image [23] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-
restoration,” in IEEE International Conference on Computer net: Convolutional networks for biomedical image segmenta-
Vision (ICCV), Sept 2009, pp. 2272–2279. tion,” in International Conference on Medical Image Comput-
[10] Antoni Buades, Bartomeu Coll, Jean-Michel Morel, and ing and Computer-Assisted Intervention (MICCAI), 2015, pp.
Catalina Sbert, “Self-similarity driven color demosaicking,” 234–241.
IEEE Transactions on Image Processing, vol. 18, no. 6, pp. [24] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-excitation net-
1192–1202, June 2009. works,” in IEEE Conference on Computer Vision and Pattern
[11] Lei Zhang, Xiaolin Wu, Antoni Buades, and Xin Li, “Color Recognition (CVPR), June 2018.
demosaicking by local directional interpolation and nonlocal [25] Vinod Nair and Geoffrey E Hinton, “Rectified linear units im-
adaptive thresholding,” Journal of Electronic Imaging, vol. 20, prove restricted boltzmann machines,” in International Con-
no. 2, pp. 023016, 2011. ference on Machine Learning (ICML), 2010, pp. 807–814.
[12] Joan Duran and Antoni Buades, “Self-similarity and spectral [26] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz,
correlation adaptive algorithm for color demosaicking,” IEEE Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan
Transactions on Image Processing, vol. 23, no. 9, pp. 4031– Wang, “Real-time single image and video super-resolution us-
4040, Sept 2014. ing an efficient sub-pixel convolutional neural network,” in
[13] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and IEEE Conference on Computer Vision and Pattern Recognition
Masatoshi Okutomi, “Residual interpolation for color image (CVPR), 2016, pp. 1874–1883.
demosaicking,” in IEEE International Conference on Image [27] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang,
Processing (ICIP), Sept 2013, pp. 2304–2308. Hongwei Yong, Hongliang Li, and Lei Zhang, “Waterloo ex-
[14] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and ploration database: New challenges for image quality assess-
Masatoshi Okutomi, “Minimized-laplacian residual interpo- ment models,” IEEE Transactions on Image Processing, vol.
lation for color image demosaicking,” in Digital Photography 26, no. 2, pp. 1004–1016, Feb 2017.
X, 2014, vol. 9023, p. 90230L. [28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
[15] Yusuke Monno, Daisuke Kiku, Masayuki Tanaka, and “Delving deep into rectifiers: Surpassing human-level perfor-
Masatoshi Okutomi, “Adaptive residual interpolation for color mance on imagenet classification,” in IEEE International Con-
image demosaicking,” in IEEE International Conference on ference on Computer Vision (ICCV), 2015, pp. 1026–1034.
Image Processing (ICIP), Sept 2015, pp. 3861–3865. [29] Diederik P Kingma and Jimmy Ba, “Adam: A method
[16] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and for stochastic optimization,” in International Conference on
Masatoshi Okutomi, “Beyond color difference: Residual in- Learning Representations (ICLR), 2015.
terpolation for color image demosaicking,” IEEE Transactions [30] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan,
on Image Processing, vol. 25, no. 3, pp. 1288–1300, March Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmai-
2016. son, Luca Antiga, and Adam Lerer, “Automatic differentiation
in pytorch,” in Advances in Neural Information Processing
[17] Yusuke Monno, Daisuke Kiku, Masayuki Tanaka, and
Systems (NIPS) Workshop, 2017.
Masatoshi Okutomi, “Adaptive residual interpolation for color
and multispectral image demosaicking,” Sensors, vol. 17, no. [31] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si-
12, pp. 2787, 2017. moncelli, “Image quality assessment: From error visibility to
structural similarity,” IEEE Transactions on Image Processing,
[18] Runjie Tan, Kai Zhang, Wangmeng Zuo, and Lei Zhang,
vol. 13, no. 4, pp. 600–612, April 2004.
“Color image demosaicking via deep residual learning,” in
251