SCNN 2018 Icip

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

COLOR IMAGE DEMOSAICKING USING A 3-STAGE

CONVOLUTIONAL NEURAL NETWORK STRUCTURE

Kai Cui? Zhi Jin†? Eckehard Steinbach?


?
Technical University of Munich, Chair of Media Technology, Munich, Germany

Shenzhen University, College of Information Engineering, Shenzhen, P.R. China

ABSTRACT Many image priors have been introduced to model the


inter-channel correlations (e.g., homogeneity [3], sparsity [4],
Color demosaicking (CDM) is a critical first step for the ac- local and non-local similarity [5, 6], integrated gradients [7]).
quisition of high-quality RGB images with single chip cam- However, these priors are mostly hand-crafted and lead to un-
eras. Conventional CDM approaches are mostly based on pleasant visual artifacts in specific cases. Recently, residual
interpolation schemes and hand-crafted image priors, which interpolation (RI) [8, 9] based approaches were proposed to
result in unpleasant visual artifacts in some cases. Motivated perform the CDM in the residual domain. The residual is
by the special characteristics of inter-channel correlations defined as the difference between the original samples and a
(higher correlations for R/G and G/B channels than that for prediction of these samples using data from other color chan-
R/B), in this paper, a 3-stage convolutional neural network nels. The guided filter from [10] is adopted to minimize the
(CNN) structure for CDM is proposed. In the first stage, the residual in order to achieve a better estimation of the miss-
G channel is reconstructed independently. Then, by using ing color components. Later, the RI method was extended
the reconstructed G channel as guidance, the R and B chan- to minimized-laplacian residual interpolation (MLRI) [11],
nels are recovered in the second stage. Finally, high-quality which minimizes the Laplacian energy of the residual, iter-
RGB color images are reconstructed in the third stage. The ative residual interpolation (IRI) [12], which performs the
objective and visual quality evaluation results show that the RI in an iterative manner, and adaptive residual interpolation
proposed structure achieves noticeable quality improvements (ARI) [13, 14], which adaptively chooses the number of iter-
in comparison to the state-of-the-art approaches. ations for each pixel. These RI based methods improve the
Index Terms— Bayer color filter array (CFA), demo- results of CDM significantly, but in some cases, the perfor-
saicking, convolutional neural networks, residual learning mance is still not satisfactory.
With the success of convolutional neural networks (CNN)
in image processing, CNN based CDM algorithms have also
1. INTRODUCTION been proposed. In [15], a deep joint demosaicking and de-
noising structure was proposed, which can solve the two prob-
In the majority of the modern digital cameras, for simplicity lems with a single network. In [16], a 2-stage network was
and low cost, a single image sensor with a color filter array proposed, which reconstructs the G channel in the first stage
(CFA) is used to acquire color images. With this kind of cam- and then reconstructs the R/G/B channels jointly in the second
era, only one intensity value (R, G, or B) can be recorded stage. This approach achieves a significant PSNR improve-
for a pixel at a time. In order to obtain the full-color im- ment in comparison to previous approaches and represents the
ages, an interpolation process is necessary to reconstruct the state-of-the-art performance.
missing color components, which is usually called color de-
mosaicking (CDM). The most commonly used CFA in mod-
M P
N
ern cameras is based on the Bayer pattern [1]. Bayer pattern P
(Aij − Ā)(Bij − B̄)
CFA based CDM has been extensively studied [2]. The sim- i=1 j=1
r= s
plest CDM method is to use bilinear or spline interpolation M P
P N M P
P N
for each color channel independently. In these approaches, ( (Aij − Ā)2 )( (Bij − B̄)2 ) (1)
i=1 j=1 i=1 j=1
however, the correlation between different color channels is
not exploited. Therefore, a key problem for CDM is how to M N M N
1 XX 1 XX
exploit and make the most of the inter-channel correlations. Ā = Aij , B̄ = Bij
M N i=1 j=1 M N i=1 j=1
This work has been supported by a PhD grant from the China Scholar-
ship Council for Kai Cui, a TUM University Foundation Fellowship and the Since exploiting the inter-channel correlation is the key
National Natural Science Foundation of China (61701313) for Zhi Jin. point for CDM, we first focus on this part and measure the

978-1-4799-7061-2/18/$31.00 ©2018 IEEE 2177 ICIP 2018


inter-channel correlation for a large set of test images. 4744 and the InputG is fed into the first stage. Then, the output of
images from the Waterloo Exploration Database (WED) [17] the first stage IntermediateG is concatenated with InputR
are used to calculate the correlation coefficients between R/G, and InputB, respectively, and fed into the second stage.
G/B and R/B, respectively. The correlation coefficient is de- The second stage is designed to explore the correlations be-
fined in Equation 1, where A and B are the considered color tween R/G and G/B, using the high-quality IntermediateG
components of an image, Ā and B̄ are the corresponding to guide the reconstruction of R and B. Using two separate
mean values, M and N are the width and height of the im- networks in the second stage to reconstruct R/G and G/B
age. As shown in Table 1, the correlation coefficients for R/G is motivated by the aforementioned differences in the inter-
and G/B are obviously higher than those for R/B. Also for channel correlation of R/G and G/B. Two separate networks
R/G and G/B, even though the mean values are close, the vari- can better model and make the most of the inter-channel
ances are quite different. These tests show that the correlation correlations. In the third stage, we concatenate the obtained
between different channels is quite different. intermediate R, G, B data, as the input of the third stage,
where the inter-channel correlations are further exploited.
Finally, the demosaicking results are obtained from the third
Table 1. Inter-channel Correlations stage. The residual learning structure from [18] is used for
Channels Mean Variance
each stage to boost the learning process.
RG 0.9010 0.0149
Fig. 2 shows the detailed structure of the network unit for
GB 0.9149 0.0105
each stage. In the first layer, 128 filters of size 3 × 3 × d
RB 0.7892 0.0407 are used to generate feature maps, the last convolutional layer
adopts d filters of size 3 × 3 × 128 to generate the correspond-
Inspired by these observations, in this paper, we propose ing output. For the hidden layers, 128 filters of size 3×3×128
a 3-stage CNN structure for color demosaicking. Since the are adopted. The number of the layers in each unit K is set to
G channel has twice as many samples compared to the R and 5 and d is set to 1, 2, 3 in the three stages, respectively. Stride
B, in the first stage, the G channel samples are reconstructed is set to 1, and zero-padding of size 1 is used to ensure that
independently. Because of the high inter-channel correlations each feature map has the same size as the input.
between R and G, as well as G and B, in the second stage, we Consider the training dataset (Xi , Yi )N i=1 , where Xi is
use two separate networks and the reconstructed G channel the i-th Bayer pattern image, Yi is the corresponding ground-
from the first stage, to guide the reconstruction of the R and truth RGB image, and N is the number of the images in the
B channels, respectively. In the third stage, the intermediate training data. During training, a loss function is defined to
R, G, B obtained from the previous stage are concatenated optimize the parameters of the networks. As shown in Fig. 1,
as input, to exploit the correlations further. Finally, the high- four losses are defined for the proposed scheme. In the first
quality demosaicked images are obtained. stage, LG is defined for the G channel. In the second stage,
The main contributions of the proposed approach are: (i) two loss functions LRG and LGB are defined since R/G and
a 3-stage CNN, in which the performance of CDM is en- G/B are processed separately. In the third stage, LRGB is de-
hanced stage by stage. (ii) Two separate networks are used fined as the loss for all three channels. The mean squared er-
to reconstruct R and B in the second stage, which makes the ror (MSE) function is used as the loss function and the overall
most of the inter-channel correlations between R/G and G/B. loss function used during training is defined as follows.
(iii) An extensive evaluation and comparison with existing ap-
proaches on different datasets has been done, which proves
1
the superiority and effectiveness of the proposed scheme. L(ω1 , ω21 ,ω22 , ω3 ) = (LG (ω1 ) + LRG (ω1 , ω21 )+
4
LGB (ω1 , ω22 ) + LRGB (ω1 , ω21 , ω22 , ω3 ))
2. PROPOSED 3-STAGE DEMOSAICKING SCHEME (2)
N
1 X
L(ω) = (kF(Ii ; ω) − Oi k2 )
In CDM with Bayer CFA, half of the pixels belong to the N i=1
G channel, the other half is equally split between R and B.
The three channels exhibit strong inter-channel correlation, where ωj are the corresponding network parameters of the j-
both structurally and spectrally, which means that the samples th stage. Ii and F(Ii ; ω) are the i-th input and output of each
from other channels can be used to enhance the quality of the stage, and Oi is the corresponding ground-truth.
current channel. Based on these characteristics, we propose
the 3-stage CNN structure shown in Fig. 1 for CDM. 3. EXPERIMENTS AND RESULTS
First, bilinear interpolation is used for initialization. Ap-
propriate initialization makes the networks stable and easier The WED database promoted in [17] is adopted in our ex-
to train. The first stage is designed to reconstruct the G chan- periments as training data. In this dataset, there are 4744
nel. The Input is split into InputR, InputG and InputB, full-color high-quality natural images of various scenes. We

2178
Fig. 1. Structure of the proposed 3-stage CNN scheme

area is the challenging case for CDM. We zoom in the window


part in the kodim19 to show the details. The 2-Stage-R refers
to our re-implementation of the 2-Stage algorithm1 . It can be
seen that for previously proposed interpolation approaches,
unpleasant visual artifacts can be observed along the edges of
the window. With the proposed method, these artifacts can be
well eliminated and the visual quality is improved.
Fig. 2. Structure of Network Unit The average Peak Signal-to-Noise Ratio (PSNR) and
composite PSNR (CPSNR) are adopted to evaluate the ob-
jective quality of different approaches. Ten pixels along the
randomly pick 4644 images for the training dataset, and the
border are cropped because some algorithms suffer from bor-
remaining 100 images are used as test dataset. The patch size
der effects. The results are listed in Table 2. The 2-Stage
is set to 50 × 50, and the patches are non-overlapped. There
and the 2-Stage-R refer to the original results reported in [16]
are 361728 patch-pairs generated from the training dataset.
and our re-implemented version, respectively. There are
The mini-batch size is set to 64. The weights of the net-
PSNR differences between the results reported in [16] and
works are initialized according to [18] and the Adam solver
those obtained with our re-implementation, which is possi-
is used to optimize the parameters. The starting learning rate
bly caused by randomly choosing the training data from the
is 0.001, and divided by 5 every 20 epochs. There are 80
WED dataset. From the results, it can be seen that the pro-
epochs in total. Other hyper-parameters are using the default
posed method leads to a 0.4–0.8dB PSNR improvement on
settings from [19]. All the experiments are performed using
different datasets in comparison to the existing algorithms.
Matlab(2017b) with the Matconvnet [20] toolbox.
We also adopt the structural similarity (SSIM) [22] as the
There are two commonly used datasets for CDM evalua-
evaluation metric on different datasets, as it takes structural
tion, the Kodak [21] and the McMaster [5] dataset. However,
information into account. Table 3 shows the average SSIM of
for the Kodak dataset, the images have relatively low reso-
different approaches on three datasets. The results are similar
lution and high spectral correlation, which is not optimal for
as for the PSNR evaluation. The proposed method outper-
the evaluation of CDM algorithms for modern digital cam-
forms the state-of-the-art.
eras [16]. The McMaster dataset also has limitations concern-
ing scene variety since it contains only 18 images. In [16], the In order to demonstrate the effectiveness of the proposed
authors proposed a new dataset WED-CDM, including 100 3-stage scheme, Table 4 shows the intermediate results of
images from the WED dataset, which is composed of various each stage for the WED-NEW dataset. It can be seen that
scenes and color gradations, but currently this dataset is not the quality is enhanced stage by stage. Especially in the sec-
publicly available. In order to evaluate the proposed scheme ond stage, the quality of all three channels is enhanced, which
comprehensively, we also use the remaining 100 images from proves that not only G channel guides the reconstruction of R
the WED dataset, named as WED-NEW dataset to evaluate and B channels, but the reconstruction of G also benefits from
the performance of the different CDM algorithms. the other two channels.
First, an example is presented in Fig. 3 to show the visual 1 The source code and the trained model of [16] are not publicly available,
quality of the proposed method in comparison to existing al- the codes and the trained model of our approach are available at https:
gorithms. Usually the texture-rich and sharp color transition //amnesiack.github.io/ICIP2018CDM/

2179
(a) Ground Truth (b) AHD (35.12dB) (c) DLMMSE (38.52dB) (d) GBTF (39.62dB) (e) LDI-NAT (35.29dB) (f) RI (35.57dB)

(g) MLRI (36.80dB) (h) ARI (38.84dB) (i) RI new (36.33dB) (j) ARI new (38.84dB)(k) 2-Stage-R(40.62dB)(l) Proposed (41.93dB)

Fig. 3. Visual Quality Comparison on kodim19 of Kodak dataset (Best seen on a computer monitor).

Table 2. Average PSNR and CPSNR results (in dB) for three datasets, the best performance is marked in bold face
Dataset Kodak McMaster WED-NEW
Method R G B RGB R G B RGB R G B RGB
AHD [3] 37.00 39.64 37.31 37.77 33.00 36.98 32.16 33.49 34.20 37.78 34.56 35.12
DLMMSE [6] 39.18 42.63 39.58 40.11 34.03 37.99 33.04 34.47 35.56 39.57 35.91 36.55
GBTF [7] 39.68 43.34 40.01 40.62 33.98 37.34 33.07 34.38 35.84 39.73 36.12 36.81
LDI-NAT [5] 37.14 39.48 37.01 37.71 36.19 39.52 34.37 36.12 35.62 38.69 35.71 36.37
RI [8] 37.94 41.00 37.82 38.61 36.10 39.99 35.38 36.50 36.00 39.52 36.41 36.93
MLRI [11] 38.87 41.83 38.86 39.58 36.35 39.90 35.36 36.62 36.53 39.93 36.82 37.42
ARI [13] 39.10 42.31 38.90 39.79 37.41 40.72 36.05 37.52 36.71 40.17 36.91 37.59
RI new [9] 38.62 41.18 38.49 39.21 36.72 40.23 35.59 36.91 36.49 39.64 36.76 37.32
ARI new [14] 39.27 42.43 39.10 39.95 37.45 40.68 36.21 37.60 36.73 40.20 36.93 37.58
2-Stage [16] 41.38 44.85 41.04 42.04 39.14 42.10 37.31 38.98 - - - -
2-Stage-R 41.36 44.31 40.31 41.64 38.85 42.04 37.05 38.74 38.52 42.30 38.54 39.39
Ours 42.07 45.18 41.09 42.39 39.60 42.60 37.68 39.39 39.32 43.04 39.37 40.19

Table 3. SSIM results for three datasets Table 4. Intermediate PSNR and CPSNR (in dB) results for
Dataset Kodak McMaster WED-NEW the proposed 3-stage approach on the WED-NEW dataset
AHD [3] 0.9798 0.9573 0.9705 PSNR R G B RGB
DLMMSE [6] 0.9866 0.9645 0.9777 Input (Bilinear) 28.82 33.16 29.07 29.90
GBTF [7] 0.9873 0.9637 0.9785 Stage1 - 36.52 - -
LDI-NAT [5] 0.9727 0.9690 0.9707 Stage1+2 34.61 38.61 37.09 36.35
RI [8] 0.9826 0.9735 0.9776 Stage1+2+3 39.32 43.04 39.37 40.19
MLRI [11] 0.9846 0.9729 0.9793
ARI [13] 0.9833 0.9760 0.9788
RI new [9] 0.9835 0.9744 0.9789
ARI new [14] 0.9840 0.9771 0.9793 structed G channel, R and B can be reconstructed and en-
2-Stage-R 0.9876 0.9793 0.9832 hanced separately. In the third stage, all intermediate R, G,
Ours 0.9941 0.9802 0.9851 B results are concatenated as the input, and a high-quality
color demosaicking image can be obtained. The experimental
results on different datasets show that the proposed scheme
4. CONCLUSION leads to better performance than the state-of-the-art CDM al-
gorithms. Also, the intermediate results of each stage show
This paper presents a 3-stage CNN-based color demosaicking that the quality of the images is enhanced stage by stage,
scheme. The first stage is used to reconstruct the G channel. which proves the rationality and effectiveness of the proposed
Then, in the second stage, with the guidance of the recon- network.

2180
5. REFERENCES [13] Y. Monno, D. Kiku, M. Tanaka, and M. Okutomi,
“Adaptive residual interpolation for color image demo-
[1] B. E. Bayer, “Color imaging array,” July 20 1976, US saicking,” in 2015 IEEE International Conference on
Patent 3,971,065. Image Processing (ICIP), Sept. 2015, pp. 3861–3865.

[2] D. Menon and G. Calvagno, “Color image demosaick- [14] Y. Monno, D. Kiku, M. Tanaka, and M. Okutomi,
ing: An overview,” Signal Processing: Image Commu- “Adaptive residual interpolation for color and multispec-
nication, vol. 26, no. 8, pp. 518–533, Oct. 2011. tral image demosaicking,” Sensors, vol. 17, no. 12, pp.
2787, Dec. 2017.
[3] K. Hirakawa and T. W. Parks, “Adaptive homogeneity-
directed demosaicing algorithm,” IEEE Transactions [15] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep
on Image Processing, vol. 14, no. 3, pp. 360–369, Mar. joint demosaicking and denoising,” ACM Transactions
2005. on Graphics, vol. 35, no. 6, pp. 191:1–191:12, Nov.
2016.
[4] X. Wu, D. Gao, G. Shi, and D. Liu, “Color demosaick-
ing with sparse representations,” in 2010 IEEE Inter- [16] R. Tan, K. Zhang, W. Zuo, and L. Zhang, “Color
national Conference on Image Processing (ICIP), Sept. image demosaicking via deep residual learning,” in
2010, pp. 1645–1648. 2017 IEEE International Conference on Multimedia and
Expo (ICME), July 2017, pp. 793–798.
[5] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demo-
saicking by local directional interpolation and nonlocal [17] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li,
adaptive thresholding,” Journal of Electronic imaging, and L. Zhang, “Waterloo exploration database: New
vol. 20, no. 2, pp. 023016, Apr. 2011. challenges for image quality assessment models,” IEEE
Transactions on Image Processing, vol. 26, no. 2, pp.
[6] L. Zhang and X. Wu, “Color demosaicking via direc- 1004–1016, Feb. 2017.
tional linear minimum mean square-error estimation,”
[18] J. Kim, J. K. Lee, and K. M. Lee, “Accurate im-
IEEE Transactions on Image Processing, vol. 14, no.
age super-resolution using very deep convolutional net-
12, pp. 2167–2178, Dec. 2005.
works,” in 2016 IEEE Conference on Computer Vision
[7] I. Pekkucuksen and Y. Altunbasak, “Gradient based and Pattern Recognition (CVPR), June 2016, pp. 1646–
threshold free color filter array interpolation,” in 2010 1654.
IEEE International Conference on Image Processing
[19] D. Kingma and J. Ba, “Adam: A method for stochastic
(ICIP), Sept. 2010, pp. 137–140.
optimization,” in International Conference on Learning
[8] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, Representations (ICLR), May 2015.
“Residual interpolation for color image demosaicking,” [20] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional
in 2013 IEEE International Conference on Image Pro- neural networks for matlab,” in Proceedings of the 23rd
cessing (ICIP), Sept. 2013, pp. 2304–2308. ACM international conference on Multimedia. ACM,
2015, pp. 689–692.
[9] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Be-
yond color difference: Residual interpolation for color [21] “Kodak lossless true color image suite,” http://
image demosaicking,” IEEE Transactions on Image http://r0k.us/graphics/kodak/.
Processing, vol. 25, no. 3, pp. 1288–1300, Mar. 2016.
[22] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-
[10] K. He, J. Sun, and X. Tang, “Guided image filtering,” celli, “Image quality assessment: from error visibility
IEEE Transactions on Pattern Analysis and Machine In- to structural similarity,” IEEE Transactions on Image
telligence, vol. 35, no. 6, pp. 1397–1409, June 2013. Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.

[11] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi,


“Minimized-laplacian residual interpolation for color
image demosaicking,” in Proceedings of SPIE, Mar.
2014, vol. 9023, p. 90230L.

[12] W. Ye and K. K. Ma, “Color image demosaicing using


iterative residual interpolation,” IEEE Transactions on
Image Processing, vol. 24, no. 12, pp. 5879–5891, Dec.
2015.

2181

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy