ICMEW.2019.00-79

2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
DEEP COLOR IMAGE DEMOSAICKING WITH FEATURE PYRAMID CHANNEL

ATTENTION
Qi Kang, Ying Fu, Hua Huang
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
qikang1994@gmail.com, fuying@bit.edu.cn, huahuang@bit.edu.cn
ABSTRACT not effectively exploit the correlation between different color

channels. Later on, various priors were introduced to model
Image demosaicking is the most crucial preprocessing step
the inter-channel correlation, e.g., integrated gradient [4],
in the current color digital camera pipeline. Efficiency and
sparsity [5–7], local and non-local self-similarity [8–12],
high quality are of importance to demosaicking methods at
residual interpolation [13–17]. [6, 7] proposed demosaick-
the request of practical applications. Recently, convolutional
ing methods based on sparse representation of natural im-
neural network (CNN) has demonstrated its superior perfor-
ages, assuming that the patches in images admitted sparse
mance on image demosaicking. However, most existed CNN-
coding over dictionaries. [9, 11, 12] showed that exploiting
based demosaicking methods fail to take full advantage of
image self-similarity and redundancy could effectively elim-
the self-similarity and redundancy in natural image, and in-
inate the interpolation artifacts generated from the initial de-
terpolation artifacts (e.g. zippering and color moire) easily
mosaicked results, but they require a large amount of compu-
occur when local geometry cannot be inferred correctly from
tation time and thus restrict their applicability to low-resource
neighboring pixels. To solve these problems, we propose a
devices. At a recent trend, residual interpolation (RI)-based
fully convolutional feature pyramid network to exploit image
algorithms [13–17] have been proposed to interpolate the dif-
self-similarity and redundant information as much as possi-
ference between a tentatively estimated pixel value and a cor-
ble for image demosaicking. Furthermore, we add a compact
responding observed pixel value, which explored the image
channel attention module to the proposed network to flexibly
sparse residual prior. However, these aforementioned hand-
rescale channel-wise features by modeling interdependencies
crafted priors based methods still fail to reconstruct compli-
among channels. Our experimental results on three datasets
cated structures and exhibit obvious artifacts, e.g. zippering,
show that our method obviously outperforms state-of-the-art
chromatic aliases.
methods on both quantitative and visual quality assessments,
and maintains competitive running time in the inference stage. Recent alternative approaches [18–22] leveraged convo-
lutional neural network (CNN) to alleviate the dependence on
Index Terms— Demosaicking, Bayer Color Filter Array, hand-crafted priors, and have achieved significant improve-
Convolutional Neural Network, Multi-scale Multi-level Fea- ments. Tan et al. [18] first restored G channel and used it to
ture Fusion, Channel Attention guide the reconstruction of the R/G/B channels jointly for im-
age demosaicking. Cui et al. [19] proposed a 3-stage CNN
1. INTRODUCTION structure for image demosaicking. It first reconstructed G
channel and then used two separate networks to reconstruct
A single charge-coupled-device (CCD) or complementary R and B channels with the G-guidance. In the third stage,
metal-oxide-semiconductor (CMOS) sensor with a color filter these obtained R/G/B channels were concatenated and input a
array (CFA) is widely used in modern color digital cameras. CNN to exploit the correlations further. Tan et al. [20] applied
The most utilized CFA is the Bayer pattern [1] and it makes a good initial demosaicking on an input image and refined
each sensor element only record one intensity value among R, it using a CNN that approximated the residuals between the
G and B color. Reconstructing the full-color image from in- initial interpolation and ground truth image. They also inte-
complete sampling of the CFA color channel output is usually grated ensemble techniques by training multiple CNN models
called color image demosaicking, for which many demosaick- and fused their outputs as the final demosaicked image, which
ing methods have been proposed. Efficient and high-quality further increased computational complexity. Besides, Gharbi
demosaicking is essential for practical applications. et al. [21] designed fully CNN with twelve layers to perform
Numerous demosaicking methods have been proposed in demosaicking as well as denoising. Kokkinos et al. [22] com-
the past decades [2, 3]. Interpolation-based methods, e.g. bi- bined CNN with an iterative method for joint demosaicking
linear interpolation, bicubic interpolation, and spline inter- and denoising. Nevertheless, these methods usually treat the
polation, interpolated each color channel separately, and did image initialized by bilinear interpolation as the input of the
978-1-5386-9214-1/19/$31.00 ©2019 IEEE 246

DOI 10.1109/ICMEW.2019.00-79
network or keep the resolution of all feature maps consistent rescale the input T . Finally, the final output of CAM (denoted

with the input. It limits the receptive field enlargement when as T ∈ Rh×w×c ) is obtained by
balancing the computational complexity. This hinders their
network from effectively exploiting image self-similarity and T i = αi · T i , i = 1, 2, · · · , c. (1)
redundancy.
In this paper, we present a multi-scale multi-level feature where Ti , αi , Ti are the output of CAM in the i-th channel,
fusion CNN with channel attention for image demosaicking, rescaling factor and feature map in the i-th channel, respec-
which can effectively exploit image self-similarity and redun- tively.
dancy on the premise of making better trade-off between per-
2.2. Network Architecture
formance and efficiency. Specifically, we first employ the
U-net [23] architecture to obtain multi-scale multi-level fea- To achieve high accuracy and keep the efficiency, a novel deep
tures and exploit image self-similarity and redundancy. Fur- architecture named Feature Pyramid Channel Attention Net-
thermore, we introduce channel attention mechanism to adap- work (FPCAN) is proposed for color image demosaicking.
tively rescale channel-wise feature for each scene to improve For Bayer CFA, the proposed network, as shown in Fig. 2, is
the representational power of our network. Extensive exper- different from the previous work [18, 19] whose input is ini-
imental results show that our proposed network achieves su- tialized by bilinear interpolation. We first pack the mosaic im-
perior performance over state-of-the-art methods in terms of age into a quarter-resolution multi-channel image and input it
quantitative evaluation and visual quality, and maintains com- to our network, which also reduces the computational cost in
petitive time complexity. the subsequent steps. The proposed network consists of four
parts, including a shallow feature extraction block, multiple
deep downsampling blocks, multiple upsampling and feature
2. THE PROPOSED METHOD fusion blocks, and a reconstruction block. Unless otherwise
specified, the stride is set to be 1 and the size of zero-padding
In this section, we introduce a channel attention module is 1 in all 3 × 3 convolution layer, which can ensure that each
(CAM) and describe our multi-scale multi-level feature fu- feature map has the same size as input.
sion network architecture for color image demosaicking. Let I ∈ R2H×2W , X ∈ RH×W ×4 , and R ∈ R2H×2W ×3
denote the input mosaic image, the packed image, and the
2.1. Channel Attention Module output of our FPCAN, respectively. In the shallow feature ex-
traction block, we use a 3 × 3 convolutional layer with ReLU,
hu wuc hu wuc
1u1uc
1u1u cr
1u1uc 1u1uc to extract feature map F0 from the packed image X:
F0 = HSF EB (X), (2)
where HSF EB (·) denotes the shallow feature extraction op-

eration, and F0 ∈ RH×W ×C . The next part is composed of
Fig. 1: Channel Attention Module (CAM). denotes multiple deep downsampling blocks by using chained mode.
element-wise multiplication. This process can be described as
Convolutional feature channels often correspond to different Fk = HDDB,k (Fk−1 ), k = 1, 2, 3. (3)

visual pattern and their significance should be different for where HDDB,k (·) denotes the function of k-th deep down-
different scenes. H W k
Inspired by [24], we introduce a lightweight channel at- sampling block, and Fk ∈ R 2k × 2k ×2 C . As shown in
tention module for color image demosaicking, which can ob- Fig. 3(a), deep downsampling block includes downsampling
tain discriminative feature among channels for variant scenes. convolutional layer with ReLU activation function, “Conv +
Given an intermediate feature map T ∈ Rh×w×c as input, the ReLU” repeated D times, and a CAM. For the downsampling
channel attention module can infer an one-dimensional vec- convolutional layer, we use 2×2 convolutional operation with
tor α = [α1 , α2 , . . . , αc ], which is used to rescale the input stride 2 for downsampling and doubling the number of feature
feature T . channels.
As a result, the feature map F3 is used as input for the next
As illustrated in Fig. 1, a global average pooling opera-
part, which is composed of multiple unsampling and feature
tion among spatial dimension h × w is first adopted to obtain
fusion blocks. This process can be represented by
global spatial context information. It is followed by a dimen-
sion reduction layer with reduction ratio r, a rectified linear
HU P,k (Fk ), k = 3,
unit (ReLU) activation function [25]. Then, the output feature Pk−1 =
map is inputted to a dimension increasing layer and a sigmoid HU P,k (Qk ), k = 2, 1. (4)
activation function to generate the factor α which is used to Qk−1 = HF F,k (Concat(Fk−1 , Pk−1 )),
247
Shallow feature extraction block
Deep downsampling block
Upscaling module 64 64 64 64
Feature fusion module
Reconstruction block
128 128 128 128
256 256 256 256

512
F3
F2 P2 F2 Q2
X
I F1 P1 F1 Q1 R
F0 P0 F0 Q0
Fig. 2: Architecture of the proposed network.
H W k−1
where {Pk−1 , Qk−1 } ∈ R 2k−1 × 2k−1 ×2 C , Concat(·, ·) 2.3. Loss Function
refers to the concatenation operation, HU P,k (·) and HF F,k (·) Let Θ be the set of network parameters. Given the training
N
denote the function of k-th upscaling module and feature fu- dataset {(Ii , Yi )}i=1 , where Ii is the i-th mosaicked image
sion module. For the feature fusion module, as shown in and Yi is the corresponding ground truth image, and N is the
Fig. 3(b), we first use a 1 × 1 convolution layer to fuse the number of the images in the training data. Our goal is to learn
shallow-level feature and the deep-level feature from the up- a mapping function f , by optimizing the parameters of the
scaling module, and halve the number of feature channels. network. To make demosaicked image Ri = f (Ii ; Θ) close
Then, “Conv + ReLU” are used D − 1 times for further deep to the ground truth Yi , mean absolute error (MAE) is used as
feature extraction. the loss function and can be formulated as
Finally, Q0 is the input of the reconstruction block and the 1 N
L(Θ) = Yi − f (Ii ; Θ)1 . (6)
demosaicked image R can be obtained by N i=1
R = HRB (Q0 ), (5) 3. EXPERIMENTS
where R ∈ R2H×2W ×3 and HRB (·) denotes the reconstruc- 3.1. Datasets and implementation details
tion block. It consists of a 1 × 1 convolution layer with ReLU Training dataset. We use the WED dataset [27] to form the
activation function, and a sub-pixel convolution layer [26] training set like [18]. It has 4744 high-quality natural images,
without activation function. where 4644 images are used for training. We further divide
them into patch size 128 × 128 with an overlap of 32 pixels as
the ground truth. To generate the mosaic image patches, we
2x2 Conv, /2 downsample the ground truth based on Bayer pattern. Finally,
approximately 300000 patch pairs are generated for training.
ReLU 1x1 Conv Moreover, to further augment the data, we apply three ways,
3x3 Conv ReLU
including randomly rotate the image with the degree of 90◦ ,
D 180◦ , 270◦ , randomly flip images horizontally, and randomly
times
ReLU 3x3 Conv flip images vertically.
D-1
times
Testing datasets. The proposed method is evaluated on two
CAM ReLU widely used benchmark datasets, i.e. the Kodak and Mc-
Master datasets. The Kodak dataset consists of 24 images
(a) deep downsampling block (b) feature fusion module (768 × 512), and the McMaster dataset contains 18 images
(500 × 500) cropped from 8 high resolution natural images.
Fig. 3: Architecture of the deep downsampling block and fea- Also, we evaluate our method on WED-CDM dataset [18],
ture fusion module. “/2” denotes a convolutional layer with a which has 100 high quality full color images from the WED
stride of 2 to downsample the feature maps, “CAM” denotes dataset.
channel attention module. Implementation details. We set the parameters C, D in Sec-
tion 2.2 to be 64 and 4, respectively. Meanwhile, reduction
248
(a) GroundTruth (b) DLMMSE [8] (c) GBTF [4] (d) LDI-NAT [11] (e) RI [13] (f) MLRI [14] (g) ARI [15]
(h) RI new [16] (i) ARI new [17] (j) DJDD [21] (k) CDM-CNN [18] (l) 3-stage CNN [19] (m) Ours(w/o CAM) (n) Ours
Fig. 4: The comparison of demosaicking results on Kodak 19. PSNR:(b) 37.16dB. (c) 38.17dB. (d) 28.42dB. (e) 38.33dB. (f)
33.27dB. (g) 35.72dB. (h) 34.75dB. (i) 32.60dB. (j) 41.08dB. (k) 40.98dB. (l) 38.69dB. (m) 43.28dB. (n) 44.07dB.
ratio r is set to be 16 in all channel attention modules. The McMaster and WED-CDM datasets are shown in Table 1.
weights are initialized by using the method proposed in [28], It can be seen that the proposed method achieves the best
and the loss is minimized by using Adam [29]. The mini- PSNR/CPSNR results than the compared methods on all three
batch size is set to be 64. The learning rate decays from 10−4 datasets. Specifically, our method outperforms the second
to 10−6 , and the weight-decay is 1e−8 . The proposed work best method by 0.72dB, 0.11dB and 0.34dB on Kodak, Mc-
is implemented on PyTorch [30] and all the experiments are Master and WED-CDM datasets, respectively. Table 3 shows
executed on a workstation with Intel(R) Core i7-6700 CPU the average SSIM of different approaches on three datasets.
3.5GHz and an Nvidia GeForce GTX 1080Ti GPU. We can see that the proposed method, similar to the PSNR
evaluation, provides best result on all datasets. In addition, the
3.2. Experimental Results results of our method without CAM are also shown in Tables
1 and 3. We can see our method without CAM also outper-
Compared Methods. We compare our method with eleven forms the state-of-the-art methods in most cases, and CAM
state-of-the-art demosaicking methods, including eight tra- can effectively improve the performance of our method.
ditional methods (i.e., DLMMSE [8], GBTF [4], LDI-
NAT [11], RI [13], MLRI [14], ARI [15], RI new [16], Qualitative Results. Aside from using PSNR and SSIM for
ARI new [17]) and three CNN-based methods (i.e., evaluation, our method also shows the advantage in terms of
DJDD [21], CDM-CNN [18], 3-stage CNN [19]). It’s worth visual quality. It is well known that the strong edges and
mentioning that we retrain those CNN-based methods with textural areas are always the challenges for color image de-
the same dataset. Besides, we provide the experimental re- mosaicking and easily lead to conspicuous artifacts such as
sults for our method with and without CAM to show the effect zippering and color moire. As shown in Fig. 4, we take the
of CAM. fence in the 19-th image of Kodak dataset with strong verti-
Quantitative Results. To quantitatively evaluate the objective cal texture as an example to compare the visual quality be-
performance of color image demosaicking methods, we adopt tween our method and compared methods. It can be observed
the PSNR and the composite PSNR (CPSNR) to measure the that almost all of compared results have chromatic aliases and
quality of each color channel and all three channels, respec- zippering, while our proposed method reconstructs the image
tively. Besides, we use the structural similarity (SSIM) [31] more perfectly. These results are consistent with the quantita-
as the evaluation metric to take structural similarity and detail tive evaluation.
information into account. Running Time. To evaluate the time complexity of these
The PSNR/CPSNR results of different methods on Kodak, methods, we use the source codes of all compared methods
249
Table 1: Average PSNR and CPSNR results (in dB) on three datasets. The best results are highlighted in bold type.
Dataset Kodak Dataset McMaster Dataset WED-CDM Dataset average
Methods R G B RGB R G B RGB R G B RGB RGB
DLMMSE [8] 39.18 42.63 39.58 40.11 34.03 37.99 33.04 34.47 35.52 39.74 35.18 36.27 36.69
GBTF [4] 39.68 43.34 40.01 40.62 33.98 37.34 33.07 34.38 35.64 39.45 35.30 36.34 36.81
LDI-NAT [11] 36.99 39.44 37.12 37.69 36.28 39.76 34.39 36.20 36.18 40.03 35.64 36.78 36.86
RI [13] 37.82 41.00 37.80 38.56 36.07 39.99 35.35 36.47 36.50 40.54 36.11 37.17 37.32
MLRI [14] 38.87 41.83 38.86 39.58 36.35 39.90 35.36 36.62 36.94 40.70 36.43 37.53 37.76
ARI [15] 39.10 42.31 38.90 39.79 37.41 40.72 36.05 37.52 37.18 40.90 36.79 37.83 38.12
RI new [16] 38.62 41.18 38.48 39.21 36.72 40.23 35.59 36.91 37.05 40.69 36.57 37.63 37.81
ARI new [17] 39.27 42.43 39.10 39.95 37.45 40.68 36.21 37.60 37.22 40.92 36.76 37.82 38.15
DJDD [21] 41.67 45.66 41.31 42.38 39.34 42.19 37.53 39.17 39.34 43.32 38.86 39.99 40.29
CDM-CNN [18] 41.38 44.85 41.04 42.04 39.14 42.10 37.31 38.98 39.01 43.04 38.54 39.67 39.98
3-stage CNN [19] 41.97 45.10 41.02 42.31 39.52 42.53 37.65 39.34 39.43 43.40 38.97 40.09 40.37
Ours(w/o CAM) 42.24 46.09 41.99 42.98 39.55 42.47 37.54 39.29 39.68 43.65 39.09 40.27 40.60
Ours 42.35 46.10 42.00 43.03 39.74 42.59 37.72 39.45 39.81 43.75 39.28 40.43 40.75
Table 2: Average execution time(second) per image on CPU or GPU.
Methods DLMMSE GBTF LDI-NAT RI MLRI ARI RI new ARI new DJDD CDM-CNN 3-stage CNN Ours(w/o CAM) Ours
CPU 3.37 4.90 358.40 0.62 0.86 20.94 1.32 24.61 3.27 3.67 13.86 4.02 4.16
Time(s) GPU - - - - - - - - 0.003 0.005 0.016 0.004 0.004
Table 3: SSIM results for three datasets. The best results are
highlighted in bold type. ing. Specifically, the feature pyramid structure allows to
reach large network depth and obtain a large receptive field,
Dataset Kodak McM WED-CDM average
which takes full advantage of image self-similarity and redun-
DLMMSE [8] 0.9866 0.9645 0.9777 0.9775
dancy effectively at the expense of low computational cost.
GBTF [4] 0.9873 0.9637 0.9779 0.9777
Meanwhile, attention mechanism is incorporated into our net-
LDI-NAT [11] 0.9727 0.9690 0.9707 0.9708
work. It can reflect channel-wise quality of features for each
RI [13] 0.9824 0.9734 0.9798 0.9794
scene and concentrate on more useful channels. Extensive
MLRI [14] 0.9846 0.9729 0.9812 0.9807 benchmark evaluations on Kodak, McMaster and WED-CDM
ARI [15] 0.9833 0.9760 0.9816 0.9812 datasets have demonstrated that our proposed network was
RI new [16] 0.9835 0.9744 0.9816 0.9810 very effective compared with the state-of-the-art demosaick-
ARI new [17] 0.9840 0.9771 0.9821 0.9818 ing methods in terms of objective and subjective assessments,
DJDD [21] 0.9862 0.9793 0.9753 0.9776 and maintained competitive running time.
CDM-CNN [18] 0.9876 0.9795 0.9735 0.9766
3-stage CNN [19] 0.9888 0.9800 0.9861 0.9858 5. ACKNOWLEDGMENTS
ours(w/o CAM) 0.9894 0.9808 0.9870 0.9866 This work was supported by the National Natural Science
ours 0.9895 0.9814 0.9875 0.9870 Foundation of China (61425013 and 61672096), and Beijing
Natural Science Foundation (L172012).
to estimate their average runtime on Kodak and McMaster 6. REFERENCES

dataset. All algorithms are tested on the same machine de-
scribed in Section 3.1. The average run time on CPU (per [1] Bryce E Bayer, “Color imaging array,” July 20 1976, US Patent
3,971,065.
image) for Kodak and McMaster dataset are provided in Ta-
ble 2. Besides, CNN-based methods are also run on GPU. [2] Xin Li, Bahadir Gunturk, and Lei Zhang, “Image demosaicing:
Although RI [13] and DJDD [21] have lower time com- A systematic survey,” in Visual Communications and Image
plexity on CPU and GPU respectively, our method achieves Processing (VCIP), 2008, vol. 6822, p. 68221J.
higher demosaicking quality and competitive time complex- [3] Daniele Menon and Giancarlo Calvagno, “Color image demo-
ity. Besides, our method is faster than the second best method saicking: An overview,” Signal Processing: Image Communi-
3-stage CNN [19] on both CPU and GPU. cation, vol. 26, no. 8-9, pp. 518–533, 2011.
[4] Ibrahim Pekkucuksen and Yucel Altunbasak, “Gradient based
4. CONCLUSION threshold free color filter array interpolation,” in IEEE Inter-
In this paper, we present a deep feature pyramid channel at- national Conference on Image Processing (ICIP), Sept 2010,
tention networks for highly accurate color image demosaick- pp. 137–140.
250
[5] Julien Mairal, Michael Elad, and Guillermo Sapiro, “Sparse IEEE International Conference on Multimedia and Expo
representation for color image restoration,” IEEE Transactions (ICME), 2017, pp. 793–798.
on Image Processing, vol. 17, no. 1, pp. 53–69, Jan 2008. [19] Kai Cui, Zhi Jin, and Eckehard Steinbach, “Color image de-
[6] Xiaolin Wu, Dahua Gao, Guangming Shi, and Danhua Liu, mosaicking using a 3-stage convolutional neural network struc-
“Color demosaicking with sparse representations,” in IEEE ture,” in IEEE International Conference on Image Processing
International Conference on Image Processing (ICIP), Sept (ICIP), Oct 2018, pp. 2177–2181.
2010, pp. 1645–1648. [20] Daniel Stanley Tan, Wei-Yang Chen, and Kai-Lung Hua,
[7] Guoshen Yu, Guillermo Sapiro, and Stéphane Mallat, “Solv- “Deepdemosaicking: Adaptive image demosaicking via multi-
ing inverse problems with piecewise linear estimators: From ple deep fully convolutional networks,” IEEE Transactions on
gaussian mixture models to structured sparsity,” IEEE Trans- Image Processing, vol. 27, no. 5, pp. 2408–2419, May 2018.
actions on Image Processing, vol. 21, no. 5, pp. 2481–2499, [21] Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo
May 2012. Durand, “Deep joint demosaicking and denoising,” ACM
[8] Lei Zhang and Xiaolin Wu, “Color demosaicking via direc- Transactions on Graphics, vol. 35, no. 6, pp. 191, 2016.
tional linear minimum mean square-error estimation,” IEEE [22] Filippos Kokkinos and Stamatios Lefkimmiatis, “Deep im-
Transactions on Image Processing, vol. 14, no. 12, pp. 2167– age demosaicking using a cascade of convolutional residual
2178, Dec 2005. denoising networks,” in European Conference on Computer
[9] Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, Vision (ECCV), September 2018.
and Andrew Zisserman, “Non-local sparse models for image [23] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-
restoration,” in IEEE International Conference on Computer net: Convolutional networks for biomedical image segmenta-
Vision (ICCV), Sept 2009, pp. 2272–2279. tion,” in International Conference on Medical Image Comput-
[10] Antoni Buades, Bartomeu Coll, Jean-Michel Morel, and ing and Computer-Assisted Intervention (MICCAI), 2015, pp.
Catalina Sbert, “Self-similarity driven color demosaicking,” 234–241.
IEEE Transactions on Image Processing, vol. 18, no. 6, pp. [24] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-excitation net-
1192–1202, June 2009. works,” in IEEE Conference on Computer Vision and Pattern
[11] Lei Zhang, Xiaolin Wu, Antoni Buades, and Xin Li, “Color Recognition (CVPR), June 2018.
demosaicking by local directional interpolation and nonlocal [25] Vinod Nair and Geoffrey E Hinton, “Rectified linear units im-
adaptive thresholding,” Journal of Electronic Imaging, vol. 20, prove restricted boltzmann machines,” in International Con-
no. 2, pp. 023016, 2011. ference on Machine Learning (ICML), 2010, pp. 807–814.
[12] Joan Duran and Antoni Buades, “Self-similarity and spectral [26] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz,
correlation adaptive algorithm for color demosaicking,” IEEE Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan
Transactions on Image Processing, vol. 23, no. 9, pp. 4031– Wang, “Real-time single image and video super-resolution us-
4040, Sept 2014. ing an efficient sub-pixel convolutional neural network,” in
[13] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and IEEE Conference on Computer Vision and Pattern Recognition
Masatoshi Okutomi, “Residual interpolation for color image (CVPR), 2016, pp. 1874–1883.
demosaicking,” in IEEE International Conference on Image [27] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang,
Processing (ICIP), Sept 2013, pp. 2304–2308. Hongwei Yong, Hongliang Li, and Lei Zhang, “Waterloo ex-
[14] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and ploration database: New challenges for image quality assess-
Masatoshi Okutomi, “Minimized-laplacian residual interpo- ment models,” IEEE Transactions on Image Processing, vol.
lation for color image demosaicking,” in Digital Photography 26, no. 2, pp. 1004–1016, Feb 2017.
X, 2014, vol. 9023, p. 90230L. [28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
[15] Yusuke Monno, Daisuke Kiku, Masayuki Tanaka, and “Delving deep into rectifiers: Surpassing human-level perfor-
Masatoshi Okutomi, “Adaptive residual interpolation for color mance on imagenet classification,” in IEEE International Con-
image demosaicking,” in IEEE International Conference on ference on Computer Vision (ICCV), 2015, pp. 1026–1034.
Image Processing (ICIP), Sept 2015, pp. 3861–3865. [29] Diederik P Kingma and Jimmy Ba, “Adam: A method
[16] Daisuke Kiku, Yusuke Monno, Masayuki Tanaka, and for stochastic optimization,” in International Conference on
Masatoshi Okutomi, “Beyond color difference: Residual in- Learning Representations (ICLR), 2015.
terpolation for color image demosaicking,” IEEE Transactions [30] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan,
on Image Processing, vol. 25, no. 3, pp. 1288–1300, March Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmai-
2016. son, Luca Antiga, and Adam Lerer, “Automatic differentiation
in pytorch,” in Advances in Neural Information Processing
[17] Yusuke Monno, Daisuke Kiku, Masayuki Tanaka, and
Systems (NIPS) Workshop, 2017.
Masatoshi Okutomi, “Adaptive residual interpolation for color
and multispectral image demosaicking,” Sensors, vol. 17, no. [31] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si-
12, pp. 2787, 2017. moncelli, “Image quality assessment: From error visibility to
structural similarity,” IEEE Transactions on Image Processing,
[18] Runjie Tan, Kai Zhang, Wangmeng Zuo, and Lei Zhang,
vol. 13, no. 4, pp. 600–612, April 2004.
“Color image demosaicking via deep residual learning,” in
251

ICMEW.2019.00-79

Uploaded by

Copyright:

Available Formats

ICMEW.2019.00-79

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICMEW.2019.00-79

Uploaded by

Copyright:

Available Formats

2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

DEEP COLOR IMAGE DEMOSAICKING WITH FEATURE PYRAMID CHANNEL

Qi Kang, Ying Fu, Hua Huang

ABSTRACT not effectively exploit the correlation between different color

978-1-5386-9214-1/19/$31.00 ©2019 IEEE 246

F0 = HSF EB (X), (2)

where HSF EB (·) denotes the shallow feature extraction op-

Convolutional feature channels often correspond to different Fk = HDDB,k (Fk−1 ), k = 1, 2, 3. (3)

256 256 256 256

R = HRB (Q0 ), (5) 3. EXPERIMENTS

to estimate their average runtime on Kodak and McMaster 6. REFERENCES

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.