Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity

NOLTA, IEICE
Paper
Deep demosaicking considering

inter-channel correlation and
self-similarity
Taishi Iriyama 1 a) , Masatoshi Sato 1 ,

Hisashi Aomori 2 , and Tsuyoshi Otake 1
1
Graduate School of Engineering, Tamagawa University,
6–1–1 Tamagawagakuen, Machida, Tokyo 194–8610, Japan
2
Department of Electrical and Electronic Engineering, Chukyo University,
101–2 Yagoto-Honmachi, Showa-ku, Nagoya 466–8666, Japan
a)
irymt3sc@engs.tamagawa.ac.jp
Received January 4, 2021; Revised March 19, 2021; Published July 1, 2021
Abstract: The effectiveness of utilizing inter-channel correlation and self-similarity for demo-
saicking has been reported in many literatures. On the other hand, many convolutional neural
network (CNN)-based demosaicking techniques have also been proposed to achieve state-of-
the-art accuracy. In CNN-based demosaicking, one of the most important issue is how to
consider the correlations using neural network. In this paper, we propose a novel CNN-based
demosaicking method that considers an effective combination of both inter-channel correlation
and self-similarity. Specifically, we apply the CNN to predict the color differences R-G and
B-G, then the demosaicked image is obtained from the predicted color differences and the input
color filter array (CFA) image. At the same time, our network considers the self-similarity in
the color difference domain by applying non-local attention for high-level feature map. Ex-
perimental results show that our method provides the better accuracy and visual performance
compared with conventional demosaicking methods. In addition, the versatility of the proposed
framework is demonstrated by experiments with images sampled by various CFA patterns.
Key Words: demosaicking, convolutional neural network, color difference, self-similarity, non-
local attention
1. Introduction
A color image is acquired by converting light into electrical signals through an image sensor equipped
with a color filter array (CFA). The CFA samples only one-color information of red (R), green
(G) and blue (B) for each pixel according to a periodic pattern. The most popular CFA pattern
in consumer digital cameras is Bayer CFA, which periodically arranged of one red, two green and
one blue pixels [1]. The missing information in the CFA image is estimated by an interpolation
process called demosaicking. The simplest demosaicking is to apply a spatially invariant standard
interpolation, such as bilinear or bicubic interpolation, for each color channel separately. However,
453
Nonlinear Theory and Its Applications, IEICE, vol. 12, no. 3, pp. 453–463 IEICE
c 2021 DOI: 10.1587/nolta.12.453
standard interpolations have the problem that result in the loss of high-frequency information and
the accompanying artifacts such as false color. In order to improve the image quality, a method
based on the inter-channel correlation evaluated using the color ratio [2, 3] or the color difference [4,
5] has been proposed. Cok et al. [2] introduced the color ratios R/G and B/G calculated from the G
channel estimated by bilinear interpolation. The color ratio domains are interpolated with less error
and are used to reconstruct the R and B channels. It is assumed that the hue is quite smooth locally
within an object in color image compared with color components. Hamilton et al. [5] interpolated the
G channel by considering the second-order derivatives of the subsampling of the R and B channels
instead of bilinear interpolation. The authors also proposed a method to interpolate the vertical and
horizontal color differences and then select them adaptively according to the edge direction. However,
these approaches cause the artifacts such as zipper effects, false colors, and aliasing when there is no
neighborhood information to estimate the target pixel. To reduce these artifacts, self-similarity based
approaches have been proposed. Buades et al. [6] proposed a demosaicking method that introduced
a non-local mean filtering considering self-similarity. This method is based on self-similarity that
the local similar patterns tend to appear successively in natural images. Duran et al. [7] proposed a
demosaicking method to consider the self-similarity and spectral correlation by applying the non-local
mean filtering to color difference domain in order to refine the color differences.
In recent years, convolutional neural network (CNN)-based demosaicking methods have been widely
studied and have shown state-of-the-art performance. Tan et al. [8] proposed a two-stage demosaicking
network that predicts the residuals between the initial demosaicked image and the training image.
Cui et al. [9] extended the two-stage demosaicking network to a three-stage demosaicking network by
adding networks to explore the correlation between R and G (and G and B). Yan et al. [10] proposed
a three-stream demosaicking network that models RGB separately in parallel, and estimates the color
difference G-R and G-B instead of the R and B channels. Zhang et al. [11] proposed a residual
non-local attention network (RNAN) to capture non-locally features by considering the long-range
dependencies between the pixels. The RNAN can improve the representation ability of the network
by incorporating non-local mask branches that rescale features by weights computed by a pair-wise
function between each two locations in the feature maps. Mei et al. [12] proposed a pyramid attention
that can capture the long-range dependencies exhaustively by extending from a single-scale feature
map to a multi-scale feature pyramid and computing feature correlations by pair-wise function from
the entire pyramid. However, these neural networks focus only on either inter-channel correlation or
self-similarity, and both need to be considered in order to fully exploit these correlations.
In this paper, we propose a novel CNN-based demosaicking method considering both the inter-
channel correlation and the self-similarity in an effective combination. Specifically, the proposed
method predicts the color differences R-G and B-G, and generates a demosaicked image from the
predicted color differences and input CFA image. Our network considers the self-similarity in the color
difference domain by applying non-local attentions to high-level feature maps. The proposed method
outperforms the conventional CNN-based demosaicking approaches on several benchmark datasets by
assessment indices such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). For
perceptual evaluations, we confirmed that the proposed method reduced the artifacts such as the false
colors that suffer in the conventional demosaicking approaches. The contributions of this paper are
summarized as follows: (1) A novel deep demosaicking considering both the inter-channel correlation
and the self-similarity is proposed. (2) We design the novel framework for CNN-based demosaicking
via the color differences. (3) Experimental results on several benchmark datasets demonstrate that
our proposed method provides better results compared with state-of-the-art CNN-based demosaicking
algorithms.
2. Materials and methods

2.1 Color difference interpolation
The color difference interpolation is based on the observation that the contrast of R-G and G-B is
quite smooth locally in natural images. In general, the demosaicking based on the color difference
interpolates the G channel first.
454
Ĝ = LI(ICF A ), (1)
where Ĝ indicates the interpolated G channel by any linear interpolation techniques (LI) from the
input CFA image ICF A . Then, the color differences are calculated as
DR = R − Ĝ, DB = B − Ĝ, (2)
where DR and DB indicate the color differences between the R and G, B and G channels. Since
high-frequency components are greatly reduced in the color difference domain, missing DR and DB
are expected to be interpolated with less error. Finally, R and B channel is reconstructed using DR
and DB .
R̂ = Ĝ + DR , B̂ = Ĝ + DB , (3)
where R̂ and B̂ indicate the interpolated R and B channels, respectively.
This paper proposes a demosaicking framework to consider the inter-channel correlation achieved
by color difference. The proposed framework reduces the complexity of estimation by replacing the
RGB channels estimation to the color differences estimation.
2.2 Non-local interpolation

The non-local interpolation is based on the observation that the local similar patterns tend to be
appeared successively in natural images. In the typical demosaicking techniques, the non-local inter-
polation is implemented by non-local mean filtering [13]. The non-local mean filtering computes the
response that depend on the similarity between the local patches by a weighted average of all pixels.
Non-local attention has been proposed by Wang et al. [14] as a representation of the non-local mean
filtering in neural networks. The non-local attention replaces the input feature map with a weighted
feature map by a pair-wise weighted average between features at all positions.
1
yi = φ(xi , xj )θ(xj ), (4)
C(x) ∀j

C(x) = φ(xi , xj ), (5)
∀j
where x and y indicate the input feature map and the weighted feature map, respectively. φ is pair-
wise function for computing the correlation between features in the input feature map. The function
θ is a unary function for feature embedding. The normalization factor C(x) is obtained by summation
of the pair-wise correlations. In neural network architectures, non-local attention is wrapped into a
non-local block.
zi = Wz yi + xi , (6)
where Wz indicates the weighted parameters for determining the importance of the non-local attention.
The non-local block trains the yi given in Eq. (4) as residuals with input xi .
In this paper, we apply the non-local attention to capture the long-range dependencies in the color
difference domain. Our network incorporates non-local blocks in the high-level feature extraction part
of the network.
2.3 Proposed method

We design a novel framework for deep demosaicking. As shown in Fig. 1, our framework consists of
three processes: packing, color difference estimation and rendering. Although our framework can be
applied to various CFA patterns, we describe the details of each part based on the Bayer pattern. At
the input of our network, the raw CFA data is first packed into a four channel image with quarter
resolution. Packing makes the spatial pattern invariant and improves the spatial resolution of the
input. Let ICF A ∈ R2H×2W ×1 , X ∈ RH×W ×4 and IDM ∈ R2H×2W ×3 be the CFA image, the packed
image and the demosaicked image, respectively.
For color difference estimation, a deep neural network based on residual in residual (RIR) struc-
ture [15] is applied. In the RIR structure, low-level features are prevented by long skip connection,
455
Fig. 1. Our framework.
Fig. 2. Network architecture.
and high-level features are extracted by a very deep network composed of several residual group. Each
residual group also consists of some residual blocks with short skip connection and the non-local block.
In the non-local block, the long-range dependencies are efficiently considered by non-local weighting
according to the pair-wise relationship between each spatial location in the high-level feature maps.
The pair-wise function φ() and unary function θ() in Eq. (4) is defined by
φ(xi , xj ) = exp(u(xi )T v(xj )) = exp((Wu xi )T Wv xj ), (7)

θ(xj ) = Wθ xj , (8)
where Wu , Wv and Wθ indicate the weighted parameters. The pair-wise function φ() utilizes em-
bedded gaussian for similarity evaluation. The linear embedding transform Wu xi , Wv xj and Wθ xj
1
are implemented by a 1 × 1 convolution illustrated in Fig. 2. Thus, C(x) ∀j φ(xi , xj ) in Eq. (4) is
equivalent to applying the softmax function along the dimension j.
As shown in Fig. 2, the color difference estimation network consists of four parts: shallow feature
extraction, deep feature extraction, upscaling and reconstruction. First, shallow features FSF are
extracted from the input packed image by a 3 × 3 convolutional layer.
FSF = HSF (X), (9)
where HSF () denotes convolutional operation. The shallow features are utilized to facilitate the
learning by transporting the shallow information to the following network by the long skip connection.
Then, deep features FDF are extracted by the deep feature extraction based on RIR structure.
FDF = HDF (FSF ), (10)
where HDF () denotes deep convolutional operation. The FDF is then upscaled to the same spatial
resolution of the input CFA image via a upscale module.
FU P = HU P (FDF ), (11)
where HU P () and FU P denote upscale operation and upscaled features, respectively. The color dif-
ferences DR , DB are then reconstructed from the upscaled features by a 3 × 3 convolutional layer
DR , DB = HREC (FU P ) = HCDEN (X), (12)
456
where HREC () and HCDEN () denotes the reconstruction layer and the color difference estimation
network, respectively.
i i N
Then, our network is optimized with loss function. Given a training set {ICF A , IGT }i=1 containing
N CFA inputs and their ground truth, each is converted to the packed image X and the color
differences Y of the ground truth, respectively. The loss function is formulated as
N
1
L(Θ) = HCDEN (X i ) − Y i 1 , (13)
N i=0
where Θ denotes the parameter set of our network.

Finally, demosaicked image is rendered from the estimated color differences and the input CFA.
⎧
⎪
⎨ICF A (i, j) − DR (i, j) if (i, j) is R observation location
⎪
G
IDM = ICF A (i, j) if (i, j) is G observation location , (14)
⎪
⎪
⎩I
CF A (i, j) − DB (i, j) if (i, j) is B observation location
⎧
⎪
⎨ICF A (i, j)
⎪ if (i, j) is R observation location
R
IDM = ICF A (i, j) + DR (i, j) if (i, j) is G observation location , (15)
⎪
⎪
⎩I
CF A (i, j) − DB (i, j) + DR (i, j) if (i, j) is B observation location
⎧
⎪
⎨ICF A (i, j) − DR (i, j) + DB (i, j) if (i, j) is R observation location
⎪
B
IDM = ICF A (i, j) + DB (i, j) if (i, j) is G observation location , (16)
⎪
⎪
⎩I (i, j) if (i, j) is B observation location
CF A
where (i, j) is the pixel coordinate. R, G and B denote the color channel of the demosaicked image
IDM .
3. Experiments
For training, we use Waterloo Exploration Database (WED) dataset [16], which consist of 4744 color
images. The WED dataset is used separately for 95% training and rest for validation. Our dataset is
randomly cropped to 96 × 96 size patches and downsampled into CFA images. The mini-batch size is
set to 16. Our model consists of two residual groups with 20 residual blocks and is optimized by Adam
optimizer [17] with β1 = 0.9, β2 = 0.999 and = 10−8 . The learning rate is 10−4 initially, and is
halved every 200 epochs. For testing, we use the datasets for benchmark of demosaicking, Kodak [18]
consisting of 24 color images (768 × 512), McMaster [19] consisting of 18 color images (500 × 500),
and Urban100 [20] consisting of 100 images.
3.1 Comparison with conventional demosaicking methods

The proposed method is compared with eight demosaicking methods, including three non-CNN-based
methods and five CNN-based methods. The non-CNN-based methods are GBFT [21], RI [22] and
MLRI [23]. The CNN-based methods are JDD [24], 2-stage [8], 3-stage [9], RNAN [11] and PANet [12].
For objective evaluation of demosaicking performance, three assessment indices are adopted: peak
signal-to-noise ratio (PSNR) for each color channel, composite PSNR (CPSNR) for all color channels,
and structural similarity (SSIM). Tables I and II show the CPSNR for each comparison methods for
the Kodak and McMaster datasets. As shown in these tables, our method provides better or compet-
itive accuracy compared with the PANet and superior accuracy compared with other demosaicking
methods. Table III shows the average of PSNR, CPSNR and SSIM on benchmark datasets, Kodak,
McMaster and Urban100. As shown in the Table III, our method achieves the highest accuracy on
almost assessment indices for all datasets. Specifically, it can be observed that the average CPSNR for
Kodak, McMaster, and Urban100 datasets outperforms the second best method by 0.07 [dB], 0.14 [dB],
and 0.76 [dB], respectively. In particular, for the Urban100 dataset, it significantly outperforms the
performance of other demosaicking methods.
457
Table I. Quantitative comparison by CPSNR on Kodak dataset.
Method
GBFT RI MLRI JDD 2-stage 3-stage RNAN PANet Ours
No.
1 39.83 35.56 36.80 41.62 41.35 41.69 43.28 43.74 43.70
2 41.18 39.46 40.78 40.82 41.84 41.85 42.44 42.57 43.07
3 42.89 41.04 42.99 44.30 45.04 45.22 45.82 45.97 46.27
4 40.26 40.18 40.96 42.72 42.86 43.20 43.99 44.28 44.53
5 37.92 36.67 37.67 40.29 40.61 40.95 41.88 42.43 42.13
6 41.01 38.45 39.14 42.30 42.16 42.14 43.35 43.78 43.83
7 42.30 41.99 42.85 44.00 44.88 44.80 45.52 45.91 45.64
8 37.19 33.99 34.90 38.68 38.63 39.16 40.32 40.43 40.73
9 43.41 41.16 42.38 43.79 44.09 44.57 45.12 45.02 45.43
10 42.65 41.53 42.26 43.53 43.90 44.21 44.87 44.95 45.05
11 40.88 38.18 39.32 41.89 42.03 42.44 43.33 43.66 43.60
12 43.84 42.37 43.27 44.50 45.07 45.02 45.95 46.25 46.29
13 35.79 31.98 33.15 37.62 37.24 37.53 38.61 39.04 39.19
14 36.73 36.30 37.63 40.11 40.05 40.34 41.12 41.65 41.43
15 39.28 38.80 39.34 41.23 41.54 41.59 42.34 42.95 43.07
16 44.54 42.27 42.89 45.02 45.23 45.64 46.41 46.68 46.50
17 42.00 40.03 41.08 42.90 42.86 43.01 43.89 43.97 44.10
18 37.54 35.31 36.52 38.11 38.73 38.89 39.89 39.89 40.30
19 41.81 39.10 39.92 42.30 42.42 42.78 43.73 43.86 43.92
20 41.58 39.91 40.68 42.35 42.91 42.91 43.51 43.69 43.52
21 39.93 37.24 38.19 41.57 41.36 41.61 42.40 42.60 42.68
22 38.62 37.55 38.62 40.29 40.26 40.51 41.17 41.55 41.46
23 43.17 42.22 43.82 43.09 44.97 45.02 44.97 45.16 45.24
24 35.43 34.13 34.68 37.79 36.96 37.15 38.51 38.86 38.99
Table II. Quantitative comparison by CPSNR on McMaster dataset.

Method
GBFT RI MLRI JDD 2-stage 3-stage RNAN PANet Ours
No.
1 26.51 28.98 28.97 31.59 31.13 31.58 32.04 32.43 32.31
2 33.25 35.00 35.10 36.43 36.14 36.30 36.88 37.09 37.30
3 32.42 33.71 33.88 36.53 36.30 36.67 37.50 37.80 37.99
4 34.34 37.88 37.67 40.25 40.48 41.10 41.83 42.32 42.37
5 30.38 33.92 34.02 36.95 36.76 37.30 37.53 37.76 38.16
6 32.26 38.32 38.29 40.80 40.66 41.07 41.52 41.77 41.90
7 39.15 36.97 37.49 41.77 41.37 41.56 41.87 42.14 42.16
8 37.35 36.98 36.97 41.49 41.03 41.34 41.86 42.28 42.22
9 33.81 35.92 36.46 39.94 40.08 40.30 40.65 40.97 41.10
10 35.77 38.15 38.66 40.95 40.93 41.21 41.53 41.89 42.00
11 36.44 39.44 39.95 41.76 41.57 41.88 42.25 42.53 42.60
12 36.17 39.64 39.68 41.47 41.41 41.73 41.78 42.13 42.30
13 38.17 40.31 40.56 42.13 42.23 42.50 42.22 42.27 42.64
14 36.77 38.95 38.79 40.21 40.19 40.42 40.55 40.77 41.03
15 36.64 38.35 38.94 40.46 40.35 40.67 40.72 40.93 41.23
16 29.39 35.15 35.09 37.27 36.46 37.25 37.92 38.25 38.41
17 28.48 32.39 32.59 36.68 36.30 36.94 37.49 37.86 37.91
18 33.68 36.48 36.12 37.83 37.83 38.05 38.55 38.75 38.89
458
Table III. Quantitative comparison by average of assessment indices on
benchmark datasets.
Methods GBFT RI MLRI JDD 2-stage 3-stage RNAN PANet Ours
R 39.44 37.82 38.87 40.95 41.31 42.85 42.65 42.53 43.09
G 43.15 41.00 41.83 45.00 44.76 45.15 46.35 46.66 46.71
Kodak
B 39.81 37.80 38.86 40.62 40.95 40.86 41.66 42.26 41.92

RGB 40.41 38.56 39.58 41.70 41.96 42.18 43.02 43.29 43.36
SSIM 0.9855 0.9787 0.9837 0.9883 0.9881 0.9887 0.9903 0.9905 0.9907
R 33.58 36.07 36.35 39.35 39.12 39.51 40.02 40.36 40.48
McMaster
G 36.58 39.99 39.90 42.12 42.05 42.51 42.76 43.02 43.18

B 32.71 35.35 35.36 37.48 37.28 37.64 37.99 38.28 38.42
RGB 33.94 36.47 36.62 39.14 38.96 39.33 39.71 40.00 40.14
SSIM 0.9279 0.9604 0.9724 0.9713 0.9702 0.9721 0.9725 0.9737 0.9758
R 34.80 33.80 34.12 37.42 37.31 37.63 38.93 39.69 40.46
Urban100
G 38.29 36.65 37.08 40.97 41.01 41.32 42.57 43.35 44.31

B 34.91 33.95 34.28 37.33 37.33 37.66 38.87 39.61 40.31
RGB 35.66 34.53 34.88 38.22 38.17 38.49 39.76 40.50 41.26
SSIM 0.9708 0.9686 0.9749 0.9825 0.9740 0.9830 0.9840 0.9856 0.9873
Fig. 3. Visual comparison for demosaicking results on Kodak dataset.
As a subjective evaluation, Figs. 3, 4, and 5 show the results of demosaicking by the comparison
methods and the proposed method for each benchmark dataset. For “img19” of the Kodak dataset
in Fig. 3, it can be seen that the false colors generated by the conventional methods are reduced and
the false interpolation structure is improved. For “img1” of McMaster dataset in Fig. 4, our method
can suppress the zipper effects at the object boundaries. Then, Fig. 5 shows the demosaicking results
for Urban100 dataset, which contains numerous textural regions that tend to cause artifacts. For
“img26”, demosaicking artifacts such as false colors, which could not be handled by conventional
methods, is significantly reduced. For “img100”, it can be confirmed that the high-frequency edges
are restored sharply by our method.
3.2 Model size comparison

We evaluate the computational complexity of the proposed method by comparing it with the con-
ventional CNN-based demosaicking methods. Since the input and reconstruction layers are similar
and trivial for all methods, we use the number of parameters in hidden layer, excluding the input
and reconstruction layers, as a measure of computational complexity. To compare the performance
of the proposed method with the conventional methods at comparable model size, we also prepared
459
Fig. 4. Visual comparison for demosaicking results on McMaster dataset.
Fig. 5. Visual comparison for demosaicking results on Urban100.
460
Table IV. Model size comparison.
Methods JDD 2-stage 3-stage RNAN PANet Ours-S1 Ours-S2 Ours
Parameters 559K 229K 2,949K 8,956K 5,953K 3,428K 5,789K 12,873K
CPSNR 38.92 38.91 39.22 40.30 40.91 40.56 41.00 41.47
lightweight models Ours-S1 and Ours-S2 with the residual blocks in each residual group reduced to 4
and 8, respectively. Table IV shows comparison of the number of parameters and the average CPSNR
for all testing datasets. Our method with 20 residual blocks in each residual group shows the best
performance with the highest number of parameters. In the comparison between the models with
the comparable number of parameters, the Ours-S1 significantly outperforms the 3-stage, and the
Ours-S2 also shows better performance than the PANet and RNAN.
3.3 Ablation study

In this paper, we proposed a novel framework considering inter-channel correlation through color
difference while considering self-similarity in the color difference domain with non-local attention.
To verify the effectiveness of the proposed method, two networks are prepared: one is the proposed
method without the color difference framework (Net-CD), and the other is the proposed method
without non-local attention (Net-NL). Net-CD predicts the image directly from the RGB data,
instead of rendering the demosaicked image using color differences. Net-NL consists of removing the
non-local attention framework from the proposed network so that it does not capture the long-range
dependencies of the color difference domain.
Figure 6 shows the evaluation of the demosaicking results by our method and comparison networks
using the Urban100 dataset. In comparison with the Net-CD, it can be qualitatively confirmed that the
false colors occur along the successive edges, and quantitatively confirmed that the CPSNR decreases
by 0.56 [dB]. In comparison with the Net-NL, it can be qualitatively confirmed that the structures
containing high-frequency components could not be reconstructed, and quantitatively confirmed that
the CPSNR decreases by 0.21 [dB]. The result suggests that the proposed color difference framework
and the application of the non-local attention to the color difference domain are effective in reducing
visual artifacts.
3.4 Result of other CFA patterns

To verify universality of the proposed framework, we demonstrate demosaicking for various RGB CFA
patterns, Bayer [1], Lukac [25], Yamanaka [26], modified Bayer [25], and Fuji X-Trans [27] illustrated
in Fig. 7. These CFA patterns can be applied to proposed framework by packing them according
Fig. 6. Results of ablation study by CPSNR and SSIM on Urban100 dataset.
Fig. 7. RGB CFA patterns.
461
Table V. Quantitative comparison for other CFA patterns.
Dataset Kodak McMaster
RGB CFAs CPSNR SSIM CPSNR SSIM
Bayer 43.43 0.9911 40.18 0.9757
Modified Bayer 42.83 0.9902 39.57 0.9742
Lukac 43.36 0.9907 40.18 0.9759
Yamanaka 43.18 0.9903 39.73 0.9743
Fuji X-Trans 43.14 0.9955 39.47 0.9877
to their patterns. The Bayer CFA is packed into the four channels image with a quarter-resolution.
Similar to the Bayer CFA, the patterns of Lukac, Yamanaka, and modified Bayer are packed into
four channels at quarter resolution. Each channel is constructed to have the same color information
without shifting its position in the image. In the same manner, the Fuji X-Trans is packed into nine
channels with a spatial resolution of one-ninth. As shown in Table V, it can be confirmed that the
proposed network provides the similar quality as the Bayer CFA with respect to CPSNR and SSIM
for all CFA patterns. These results suggest that the proposed method is highly universal for various
RGB CFA patterns.
4. Conclusion
In this paper, we proposed a novel deep demosaicking method that effectively consider both the inter-
channel correlation and self-similarity. Specifically, CNN was applied to predict the color differences
R-G and B-G, and demosaicking was performed using these predictions. By rendering the image using
color differences, demosaicking is achieved by utilizing the correlation between RGB channels. Fur-
thermore, our network has built-in non-local blocks to efficiently capture the long-range dependences
in the color difference domain, which makes it possible to sharply predict the high-frequency compo-
nents in texture and edge regions. Experiments on the benchmark dataset Kodak, McMaster, and
Urban100 objectively showed that our method outperformed other conventional methods in several
evaluation metrics. In subjective evaluation, it was confirmed that our method not only reduced the
false colors but also sharply reconstructed edges and texture regions. Although the proposed method
performs best on Bayer CFA pattern, the demosaicking framework, the use of color differences, and
the use of long-range dependencies can all be applied to various CFA patterns by using a different
packing method.
References
[1] B.E. Bayer, “Color imaging array,” US Patent 3,971,065, July 1976.
[2] D.R. Cok, “Signal processing method and apparatus for producing interpolated chrominance
values in a sampled color image signal,” US Patent 4,642,678, February 1987.
[3] R. Kimmel, “Demosaicing: Image reconstruction from color CCD samples,” IEEE Transactions
on Image Processing, vol. 8, no. 9, pp. 1221–1228, 1999.
[4] C.A. Laroche and M.A. Prescott, “Apparatus and method for adaptively interpolating a full
color image utilizing chrominance gradients,” US Patent 5,373,322, December 1994.
[5] J.E. Adams, Jr. and J.F. Hamilton, Jr., “Adaptive color plane interpolation in single sensor
color electronic camera,” US Patent 5,652,621, July 1997.
[6] A. Buades, B. Coll, J. Morel, and C. Sbert, “Self-similarity driven color demosaicking,” IEEE
Transactions on Image Processing, vol. 18, no. 6, pp. 1192–1202, 2009.
[7] J. Duran and A. Buades, “Self-similarity and spectral correlation adaptive algorithm for color
demosaicking,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 4031–4040, 2014.
[8] R. Tan, K. Zhang, W. Zuo, and L. Zhang, “Color image demosaicking via deep residual learn-
ing,” Proc. of ICME ’17, pp. 793–798, 2017.
[9] K. Cui, Z. Jin, and E. Steinbach, “Color image demosaicking using a 3-stage convolutional
neural network structure,” Proc. of ICIP ’18, pp. 2177–2181, 2018.
462
[10] N. Yan and J. Ouyang, “Channel-by-channel demosaicking networks with embedded spectral
correlation,” 2020.
[11] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention networks for image
restoration,” Proc. of ICLR ’19, 2019.
[12] Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T.S. Huang, and H. Shi, “Pyramid
attention networks for image restoration,” arXiv preprint arXiv:2004.13824, 2020.
[13] A. Buades, B. Coll, and J. Morel, “A non-local algorithm for image denoising,” Proc. of
CVPR ’05, pp. 60–65, 2005.
[14] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” Proc. of CVPR ’18,
pp. 7794–7803, 2018.
[15] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep
residual channel attention networks,” Proc. of ECCV ’18, pp. 286–301, 2018.
[16] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang, “Waterloo Exploration
Database: New challenges for image quality assessment models,” IEEE Transactions on Image
Processing, vol. 26, no. 2, pp. 1004–1016, February 2017.
[17] D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv arXiv:1412.6980,
2014.
[18] http://r0k.us/graphics/kodak/.
[19] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local directional interpolation
and nonlocal adaptive thresholding,” Journal of Electronic Imaging, vol.20, no.2, pp.1–17, 2011.
[20] J.B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-
exemplars,” Proc. of CVPR ’15, pp. 5197–5206, 2015.
[21] I. Pekkucuksen and Y. Altunbasak, “Gradient based threshold free color filter array interpola-
tion,” Proc. of ICIP ’10, IEEE, pp. 137–140, 2010.
[22] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Residual interpolation for color image
demosaicking,” Proc. of ICIP ’13, IEEE, pp. 2304–2308, 2013.
[23] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Minimized-Laplacian residual interpolation
for color image demosaicking,” Proceedings of SPIE - The International Society for Optical
Engineering, vol. 9023, February 2014.
[24] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint demosaicking and denoising,”
ACM Transactions on Graphics, vol. 35, no. 6, pp. 1–12, 2016.
[25] R. Lukac and K.N. Plataniotis, “Color filter arrays: Design and performance analysis,” IEEE
Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1260–1267, 2005.
[26] S. Yamanaka, “Solid state color camera,” US Patent 4,064,532, December 1977.
[27] https://www.fujifilm.eu/uk/products/digital-cameras/model/x-pro1/features-4483/
aps-c-16m-x-trans-cmos.
463

Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity

Uploaded by

Copyright:

Available Formats

Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Demosaicking Considering Inter-Channel Correlation and Self-Similarity

Uploaded by

Copyright:

Available Formats

NOLTA, IEICE

Deep demosaicking considering

Taishi Iriyama 1 a) , Masatoshi Sato 1 ,

2. Materials and methods

DR = R − Ĝ, DB = B − Ĝ, (2)

2.2 Non-local interpolation

2.3 Proposed method

Fig. 2. Network architecture.

φ(xi , xj ) = exp(u(xi )T v(xj )) = exp((Wu xi )T Wv xj ), (7)

FSF = HSF (X), (9)

FDF = HDF (FSF ), (10)

DR , DB = HREC (FU P ) = HCDEN (X), (12)

where Θ denotes the parameter set of our network.

3.1 Comparison with conventional demosaicking methods

Table II. Quantitative comparison by CPSNR on McMaster dataset.

B 39.81 37.80 38.86 40.62 40.95 40.86 41.66 42.26 41.92

G 36.58 39.99 39.90 42.12 42.05 42.51 42.76 43.02 43.18

G 38.29 36.65 37.08 40.97 41.01 41.32 42.57 43.35 44.31

Fig. 3. Visual comparison for demosaicking results on Kodak dataset.

3.2 Model size comparison

Fig. 5. Visual comparison for demosaicking results on Urban100.

3.3 Ablation study

3.4 Result of other CFA patterns

Fig. 6. Results of ablation study by CPSNR and SSIM on Urban100 dataset.

Fig. 7. RGB CFA patterns.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.