Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity
Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity
Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity
Paper
a)
irymt3sc@engs.tamagawa.ac.jp
Received January 4, 2021; Revised March 19, 2021; Published July 1, 2021
Abstract: The effectiveness of utilizing inter-channel correlation and self-similarity for demo-
saicking has been reported in many literatures. On the other hand, many convolutional neural
network (CNN)-based demosaicking techniques have also been proposed to achieve state-of-
the-art accuracy. In CNN-based demosaicking, one of the most important issue is how to
consider the correlations using neural network. In this paper, we propose a novel CNN-based
demosaicking method that considers an effective combination of both inter-channel correlation
and self-similarity. Specifically, we apply the CNN to predict the color differences R-G and
B-G, then the demosaicked image is obtained from the predicted color differences and the input
color filter array (CFA) image. At the same time, our network considers the self-similarity in
the color difference domain by applying non-local attention for high-level feature map. Ex-
perimental results show that our method provides the better accuracy and visual performance
compared with conventional demosaicking methods. In addition, the versatility of the proposed
framework is demonstrated by experiments with images sampled by various CFA patterns.
Key Words: demosaicking, convolutional neural network, color difference, self-similarity, non-
local attention
1. Introduction
A color image is acquired by converting light into electrical signals through an image sensor equipped
with a color filter array (CFA). The CFA samples only one-color information of red (R), green
(G) and blue (B) for each pixel according to a periodic pattern. The most popular CFA pattern
in consumer digital cameras is Bayer CFA, which periodically arranged of one red, two green and
one blue pixels [1]. The missing information in the CFA image is estimated by an interpolation
process called demosaicking. The simplest demosaicking is to apply a spatially invariant standard
interpolation, such as bilinear or bicubic interpolation, for each color channel separately. However,
453
Nonlinear Theory and Its Applications, IEICE, vol. 12, no. 3, pp. 453–463 IEICE
c 2021 DOI: 10.1587/nolta.12.453
standard interpolations have the problem that result in the loss of high-frequency information and
the accompanying artifacts such as false color. In order to improve the image quality, a method
based on the inter-channel correlation evaluated using the color ratio [2, 3] or the color difference [4,
5] has been proposed. Cok et al. [2] introduced the color ratios R/G and B/G calculated from the G
channel estimated by bilinear interpolation. The color ratio domains are interpolated with less error
and are used to reconstruct the R and B channels. It is assumed that the hue is quite smooth locally
within an object in color image compared with color components. Hamilton et al. [5] interpolated the
G channel by considering the second-order derivatives of the subsampling of the R and B channels
instead of bilinear interpolation. The authors also proposed a method to interpolate the vertical and
horizontal color differences and then select them adaptively according to the edge direction. However,
these approaches cause the artifacts such as zipper effects, false colors, and aliasing when there is no
neighborhood information to estimate the target pixel. To reduce these artifacts, self-similarity based
approaches have been proposed. Buades et al. [6] proposed a demosaicking method that introduced
a non-local mean filtering considering self-similarity. This method is based on self-similarity that
the local similar patterns tend to appear successively in natural images. Duran et al. [7] proposed a
demosaicking method to consider the self-similarity and spectral correlation by applying the non-local
mean filtering to color difference domain in order to refine the color differences.
In recent years, convolutional neural network (CNN)-based demosaicking methods have been widely
studied and have shown state-of-the-art performance. Tan et al. [8] proposed a two-stage demosaicking
network that predicts the residuals between the initial demosaicked image and the training image.
Cui et al. [9] extended the two-stage demosaicking network to a three-stage demosaicking network by
adding networks to explore the correlation between R and G (and G and B). Yan et al. [10] proposed
a three-stream demosaicking network that models RGB separately in parallel, and estimates the color
difference G-R and G-B instead of the R and B channels. Zhang et al. [11] proposed a residual
non-local attention network (RNAN) to capture non-locally features by considering the long-range
dependencies between the pixels. The RNAN can improve the representation ability of the network
by incorporating non-local mask branches that rescale features by weights computed by a pair-wise
function between each two locations in the feature maps. Mei et al. [12] proposed a pyramid attention
that can capture the long-range dependencies exhaustively by extending from a single-scale feature
map to a multi-scale feature pyramid and computing feature correlations by pair-wise function from
the entire pyramid. However, these neural networks focus only on either inter-channel correlation or
self-similarity, and both need to be considered in order to fully exploit these correlations.
In this paper, we propose a novel CNN-based demosaicking method considering both the inter-
channel correlation and the self-similarity in an effective combination. Specifically, the proposed
method predicts the color differences R-G and B-G, and generates a demosaicked image from the
predicted color differences and input CFA image. Our network considers the self-similarity in the color
difference domain by applying non-local attentions to high-level feature maps. The proposed method
outperforms the conventional CNN-based demosaicking approaches on several benchmark datasets by
assessment indices such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). For
perceptual evaluations, we confirmed that the proposed method reduced the artifacts such as the false
colors that suffer in the conventional demosaicking approaches. The contributions of this paper are
summarized as follows: (1) A novel deep demosaicking considering both the inter-channel correlation
and the self-similarity is proposed. (2) We design the novel framework for CNN-based demosaicking
via the color differences. (3) Experimental results on several benchmark datasets demonstrate that
our proposed method provides better results compared with state-of-the-art CNN-based demosaicking
algorithms.
454
Ĝ = LI(ICF A ), (1)
where Ĝ indicates the interpolated G channel by any linear interpolation techniques (LI) from the
input CFA image ICF A . Then, the color differences are calculated as
where DR and DB indicate the color differences between the R and G, B and G channels. Since
high-frequency components are greatly reduced in the color difference domain, missing DR and DB
are expected to be interpolated with less error. Finally, R and B channel is reconstructed using DR
and DB .
R̂ = Ĝ + DR , B̂ = Ĝ + DB , (3)
where R̂ and B̂ indicate the interpolated R and B channels, respectively.
This paper proposes a demosaicking framework to consider the inter-channel correlation achieved
by color difference. The proposed framework reduces the complexity of estimation by replacing the
RGB channels estimation to the color differences estimation.
where x and y indicate the input feature map and the weighted feature map, respectively. φ is pair-
wise function for computing the correlation between features in the input feature map. The function
θ is a unary function for feature embedding. The normalization factor C(x) is obtained by summation
of the pair-wise correlations. In neural network architectures, non-local attention is wrapped into a
non-local block.
zi = Wz yi + xi , (6)
where Wz indicates the weighted parameters for determining the importance of the non-local attention.
The non-local block trains the yi given in Eq. (4) as residuals with input xi .
In this paper, we apply the non-local attention to capture the long-range dependencies in the color
difference domain. Our network incorporates non-local blocks in the high-level feature extraction part
of the network.
455
Fig. 1. Our framework.
and high-level features are extracted by a very deep network composed of several residual group. Each
residual group also consists of some residual blocks with short skip connection and the non-local block.
In the non-local block, the long-range dependencies are efficiently considered by non-local weighting
according to the pair-wise relationship between each spatial location in the high-level feature maps.
The pair-wise function φ() and unary function θ() in Eq. (4) is defined by
where Wu , Wv and Wθ indicate the weighted parameters. The pair-wise function φ() utilizes em-
bedded gaussian for similarity evaluation. The linear embedding transform Wu xi , Wv xj and Wθ xj
1
are implemented by a 1 × 1 convolution illustrated in Fig. 2. Thus, C(x) ∀j φ(xi , xj ) in Eq. (4) is
equivalent to applying the softmax function along the dimension j.
As shown in Fig. 2, the color difference estimation network consists of four parts: shallow feature
extraction, deep feature extraction, upscaling and reconstruction. First, shallow features FSF are
extracted from the input packed image by a 3 × 3 convolutional layer.
where HSF () denotes convolutional operation. The shallow features are utilized to facilitate the
learning by transporting the shallow information to the following network by the long skip connection.
Then, deep features FDF are extracted by the deep feature extraction based on RIR structure.
where HDF () denotes deep convolutional operation. The FDF is then upscaled to the same spatial
resolution of the input CFA image via a upscale module.
FU P = HU P (FDF ), (11)
where HU P () and FU P denote upscale operation and upscaled features, respectively. The color dif-
ferences DR , DB are then reconstructed from the upscaled features by a 3 × 3 convolutional layer
456
where HREC () and HCDEN () denotes the reconstruction layer and the color difference estimation
network, respectively.
i i N
Then, our network is optimized with loss function. Given a training set {ICF A , IGT }i=1 containing
N CFA inputs and their ground truth, each is converted to the packed image X and the color
differences Y of the ground truth, respectively. The loss function is formulated as
N
1
L(Θ) = HCDEN (X i ) − Y i 1 , (13)
N i=0
where (i, j) is the pixel coordinate. R, G and B denote the color channel of the demosaicked image
IDM .
3. Experiments
For training, we use Waterloo Exploration Database (WED) dataset [16], which consist of 4744 color
images. The WED dataset is used separately for 95% training and rest for validation. Our dataset is
randomly cropped to 96 × 96 size patches and downsampled into CFA images. The mini-batch size is
set to 16. Our model consists of two residual groups with 20 residual blocks and is optimized by Adam
optimizer [17] with β1 = 0.9, β2 = 0.999 and = 10−8 . The learning rate is 10−4 initially, and is
halved every 200 epochs. For testing, we use the datasets for benchmark of demosaicking, Kodak [18]
consisting of 24 color images (768 × 512), McMaster [19] consisting of 18 color images (500 × 500),
and Urban100 [20] consisting of 100 images.
457
Table I. Quantitative comparison by CPSNR on Kodak dataset.
Method
GBFT RI MLRI JDD 2-stage 3-stage RNAN PANet Ours
No.
1 39.83 35.56 36.80 41.62 41.35 41.69 43.28 43.74 43.70
2 41.18 39.46 40.78 40.82 41.84 41.85 42.44 42.57 43.07
3 42.89 41.04 42.99 44.30 45.04 45.22 45.82 45.97 46.27
4 40.26 40.18 40.96 42.72 42.86 43.20 43.99 44.28 44.53
5 37.92 36.67 37.67 40.29 40.61 40.95 41.88 42.43 42.13
6 41.01 38.45 39.14 42.30 42.16 42.14 43.35 43.78 43.83
7 42.30 41.99 42.85 44.00 44.88 44.80 45.52 45.91 45.64
8 37.19 33.99 34.90 38.68 38.63 39.16 40.32 40.43 40.73
9 43.41 41.16 42.38 43.79 44.09 44.57 45.12 45.02 45.43
10 42.65 41.53 42.26 43.53 43.90 44.21 44.87 44.95 45.05
11 40.88 38.18 39.32 41.89 42.03 42.44 43.33 43.66 43.60
12 43.84 42.37 43.27 44.50 45.07 45.02 45.95 46.25 46.29
13 35.79 31.98 33.15 37.62 37.24 37.53 38.61 39.04 39.19
14 36.73 36.30 37.63 40.11 40.05 40.34 41.12 41.65 41.43
15 39.28 38.80 39.34 41.23 41.54 41.59 42.34 42.95 43.07
16 44.54 42.27 42.89 45.02 45.23 45.64 46.41 46.68 46.50
17 42.00 40.03 41.08 42.90 42.86 43.01 43.89 43.97 44.10
18 37.54 35.31 36.52 38.11 38.73 38.89 39.89 39.89 40.30
19 41.81 39.10 39.92 42.30 42.42 42.78 43.73 43.86 43.92
20 41.58 39.91 40.68 42.35 42.91 42.91 43.51 43.69 43.52
21 39.93 37.24 38.19 41.57 41.36 41.61 42.40 42.60 42.68
22 38.62 37.55 38.62 40.29 40.26 40.51 41.17 41.55 41.46
23 43.17 42.22 43.82 43.09 44.97 45.02 44.97 45.16 45.24
24 35.43 34.13 34.68 37.79 36.96 37.15 38.51 38.86 38.99
458
Table III. Quantitative comparison by average of assessment indices on
benchmark datasets.
Methods GBFT RI MLRI JDD 2-stage 3-stage RNAN PANet Ours
R 39.44 37.82 38.87 40.95 41.31 42.85 42.65 42.53 43.09
G 43.15 41.00 41.83 45.00 44.76 45.15 46.35 46.66 46.71
Kodak
As a subjective evaluation, Figs. 3, 4, and 5 show the results of demosaicking by the comparison
methods and the proposed method for each benchmark dataset. For “img19” of the Kodak dataset
in Fig. 3, it can be seen that the false colors generated by the conventional methods are reduced and
the false interpolation structure is improved. For “img1” of McMaster dataset in Fig. 4, our method
can suppress the zipper effects at the object boundaries. Then, Fig. 5 shows the demosaicking results
for Urban100 dataset, which contains numerous textural regions that tend to cause artifacts. For
“img26”, demosaicking artifacts such as false colors, which could not be handled by conventional
methods, is significantly reduced. For “img100”, it can be confirmed that the high-frequency edges
are restored sharply by our method.
459
Fig. 4. Visual comparison for demosaicking results on McMaster dataset.
460
Table IV. Model size comparison.
Methods JDD 2-stage 3-stage RNAN PANet Ours-S1 Ours-S2 Ours
Parameters 559K 229K 2,949K 8,956K 5,953K 3,428K 5,789K 12,873K
CPSNR 38.92 38.91 39.22 40.30 40.91 40.56 41.00 41.47
lightweight models Ours-S1 and Ours-S2 with the residual blocks in each residual group reduced to 4
and 8, respectively. Table IV shows comparison of the number of parameters and the average CPSNR
for all testing datasets. Our method with 20 residual blocks in each residual group shows the best
performance with the highest number of parameters. In the comparison between the models with
the comparable number of parameters, the Ours-S1 significantly outperforms the 3-stage, and the
Ours-S2 also shows better performance than the PANet and RNAN.
461
Table V. Quantitative comparison for other CFA patterns.
Dataset Kodak McMaster
RGB CFAs CPSNR SSIM CPSNR SSIM
Bayer 43.43 0.9911 40.18 0.9757
Modified Bayer 42.83 0.9902 39.57 0.9742
Lukac 43.36 0.9907 40.18 0.9759
Yamanaka 43.18 0.9903 39.73 0.9743
Fuji X-Trans 43.14 0.9955 39.47 0.9877
to their patterns. The Bayer CFA is packed into the four channels image with a quarter-resolution.
Similar to the Bayer CFA, the patterns of Lukac, Yamanaka, and modified Bayer are packed into
four channels at quarter resolution. Each channel is constructed to have the same color information
without shifting its position in the image. In the same manner, the Fuji X-Trans is packed into nine
channels with a spatial resolution of one-ninth. As shown in Table V, it can be confirmed that the
proposed network provides the similar quality as the Bayer CFA with respect to CPSNR and SSIM
for all CFA patterns. These results suggest that the proposed method is highly universal for various
RGB CFA patterns.
4. Conclusion
In this paper, we proposed a novel deep demosaicking method that effectively consider both the inter-
channel correlation and self-similarity. Specifically, CNN was applied to predict the color differences
R-G and B-G, and demosaicking was performed using these predictions. By rendering the image using
color differences, demosaicking is achieved by utilizing the correlation between RGB channels. Fur-
thermore, our network has built-in non-local blocks to efficiently capture the long-range dependences
in the color difference domain, which makes it possible to sharply predict the high-frequency compo-
nents in texture and edge regions. Experiments on the benchmark dataset Kodak, McMaster, and
Urban100 objectively showed that our method outperformed other conventional methods in several
evaluation metrics. In subjective evaluation, it was confirmed that our method not only reduced the
false colors but also sharply reconstructed edges and texture regions. Although the proposed method
performs best on Bayer CFA pattern, the demosaicking framework, the use of color differences, and
the use of long-range dependencies can all be applied to various CFA patterns by using a different
packing method.
References
[1] B.E. Bayer, “Color imaging array,” US Patent 3,971,065, July 1976.
[2] D.R. Cok, “Signal processing method and apparatus for producing interpolated chrominance
values in a sampled color image signal,” US Patent 4,642,678, February 1987.
[3] R. Kimmel, “Demosaicing: Image reconstruction from color CCD samples,” IEEE Transactions
on Image Processing, vol. 8, no. 9, pp. 1221–1228, 1999.
[4] C.A. Laroche and M.A. Prescott, “Apparatus and method for adaptively interpolating a full
color image utilizing chrominance gradients,” US Patent 5,373,322, December 1994.
[5] J.E. Adams, Jr. and J.F. Hamilton, Jr., “Adaptive color plane interpolation in single sensor
color electronic camera,” US Patent 5,652,621, July 1997.
[6] A. Buades, B. Coll, J. Morel, and C. Sbert, “Self-similarity driven color demosaicking,” IEEE
Transactions on Image Processing, vol. 18, no. 6, pp. 1192–1202, 2009.
[7] J. Duran and A. Buades, “Self-similarity and spectral correlation adaptive algorithm for color
demosaicking,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 4031–4040, 2014.
[8] R. Tan, K. Zhang, W. Zuo, and L. Zhang, “Color image demosaicking via deep residual learn-
ing,” Proc. of ICME ’17, pp. 793–798, 2017.
[9] K. Cui, Z. Jin, and E. Steinbach, “Color image demosaicking using a 3-stage convolutional
neural network structure,” Proc. of ICIP ’18, pp. 2177–2181, 2018.
462
[10] N. Yan and J. Ouyang, “Channel-by-channel demosaicking networks with embedded spectral
correlation,” 2020.
[11] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention networks for image
restoration,” Proc. of ICLR ’19, 2019.
[12] Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T.S. Huang, and H. Shi, “Pyramid
attention networks for image restoration,” arXiv preprint arXiv:2004.13824, 2020.
[13] A. Buades, B. Coll, and J. Morel, “A non-local algorithm for image denoising,” Proc. of
CVPR ’05, pp. 60–65, 2005.
[14] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” Proc. of CVPR ’18,
pp. 7794–7803, 2018.
[15] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep
residual channel attention networks,” Proc. of ECCV ’18, pp. 286–301, 2018.
[16] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang, “Waterloo Exploration
Database: New challenges for image quality assessment models,” IEEE Transactions on Image
Processing, vol. 26, no. 2, pp. 1004–1016, February 2017.
[17] D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv arXiv:1412.6980,
2014.
[18] http://r0k.us/graphics/kodak/.
[19] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local directional interpolation
and nonlocal adaptive thresholding,” Journal of Electronic Imaging, vol.20, no.2, pp.1–17, 2011.
[20] J.B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-
exemplars,” Proc. of CVPR ’15, pp. 5197–5206, 2015.
[21] I. Pekkucuksen and Y. Altunbasak, “Gradient based threshold free color filter array interpola-
tion,” Proc. of ICIP ’10, IEEE, pp. 137–140, 2010.
[22] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Residual interpolation for color image
demosaicking,” Proc. of ICIP ’13, IEEE, pp. 2304–2308, 2013.
[23] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Minimized-Laplacian residual interpolation
for color image demosaicking,” Proceedings of SPIE - The International Society for Optical
Engineering, vol. 9023, February 2014.
[24] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint demosaicking and denoising,”
ACM Transactions on Graphics, vol. 35, no. 6, pp. 1–12, 2016.
[25] R. Lukac and K.N. Plataniotis, “Color filter arrays: Design and performance analysis,” IEEE
Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1260–1267, 2005.
[26] S. Yamanaka, “Solid state color camera,” US Patent 4,064,532, December 1977.
[27] https://www.fujifilm.eu/uk/products/digital-cameras/model/x-pro1/features-4483/
aps-c-16m-x-trans-cmos.
463