Image De-Noising With Machine Learning A Review
Image De-Noising With Machine Learning A Review
Image De-Noising With Machine Learning A Review
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
ABSTRACT Images are susceptible to various kinds of noises, which corrupt the pictorial information
stored in the images. Image de-noising has become an integral part of the image processing workflow. It is
used to attenuate the noises and accentuate the specific image information stored within. Machine learning
is an important tool in the image-de-noising workflow in terms of its robustness, accuracy, and time
requirement. This paper explores the numerous state-of-the-art machine-learning-based image de-noisers
like dictionary learning models, convolutional neural networks and generative adversarial networks for a
range of noises like Gaussian, Impulse, Poisson, Mixed and Real-World noises. The motivation, algorithm
and framework of different machine learning de-noisers are analyzed. These de-noisers are compared using
PSNR as quality assessment metric on some benchmark datasets. The best de-noising results for different
noise type is discussed along with future prospects. Among various Gaussian noise de-noisers, GCBD,
BRDNet and PReLU network prove to be promising. CNN+LSTM, and MC2RNet are most suitable CNN-
based Poisson de-noisers. For impulse noise removal, Blind CNN, and CNN+PSO perform well. For mixed
noise removal, WDL, EM-CNN, CNN, SDL, and Mixed CNN are prominent. De-noisers like GRDN and
DDFN show accurate results in the domain of real-world de-noising.
INDEX TERMS Convolutional Neural Networks, Dictionary Learning, Generative Adversarial Networks,
Image De-noising, Machine Learning
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
Gradient
Multilayer Perceptron
Non-Data Adaptive
Non-local
Transform (Wavelet Domain,
Spatial Domain)
Convolutional Neural
Weighted Average Networks
Generative Adversarial
Non-local Based Transform Networks
Domain (BM3D, BM4D)
Impulse Noise: It is an additive noise which occurs due to Rayleigh Noise: The noise in synthetic aperture radar
faulty sensors and transmission error. It affects only certain (SAR) images is granular in nature, and it is modeled by
pixels in the entire image, unlike Gaussian noise. It is Rayleigh distribution [8]. Sometimes, ultrasound images
divided into two parts, i.e., salt and pepper impulse noise are also prone to Rayleigh noise corruption. The Rayleigh
(SPIN) and random valued impulse noise (RVIN). In salt distribution is given by the following probability density
and pepper noise corruption, some image pixels take either −(𝑥−𝑎)2
2
maximum or minimum value of image dynamic range. 𝑝(𝑥) = {𝑏 (𝑥 − 𝑎)𝑒 𝑏 𝑓𝑜𝑟 𝑥 ≥ 𝑎 (5)
Whereas RVIN corruption changes some image pixels with 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
a random value, its detection is more difficult than salt and Cauchy Noise: The atmospheric and underwater acoustic
pepper noise detection. The salt and pepper impulse noise is signals of radar and sonar imaging are corrupted with
given by [4] additive heavy-tailed impulse like noise, known as Cauchy
𝑃𝑎 𝑓𝑜𝑟 𝑥 = 𝑎 noise [9]. The probability distribution function of Cauchy
𝑝(𝑥) = {𝑃𝑏 𝑓𝑜𝑟 𝑥 = 𝑏} (2) distribution is given by:
1 𝛾2
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑝(𝑥; 𝛿, 𝛾) = (6)
𝜋𝛾 𝛾2 +(𝑥−𝛿)2
where 𝑎 and 𝑏 are minimum and maximum pixel values of
where 𝛾 > 0 denotes the scale parameter, and δ∈ ℝ denotes
an image dynamic range. 𝑃𝑎 and 𝑃𝑏 are probabilities are
the localization parameter.
equal for salt and pepper noise.
Mixed Noise: In many real-life applications, images are
Poisson or Photon Noise: The Poisson distribution is used
corrupted by more than one noise type. The mixture of
to model photon noise caused by the photon’s random
Gaussian and impulse noise is found in computed
arrival on the image sensor [5]. The applications of Poisson
tomography (CT) images and cDNA microarray imaging
noise removal include astronomy, medical imaging, and
[10], [11]. The mixed noise in cDNA microarray imaging
low-light photography. The conditional probability of
occurs due to photon and electronic noise interaction, dust
Poisson distributed image 𝑦 for clean image 𝑥 is given by
particles on surface of glass slides, and laser reflection. In
[6]
−𝑥𝑖,𝑗 𝑦 hyperspectral images, the combination of signal
𝑒 𝑥𝑖,𝑗 𝑖,𝑗
𝑝(𝑦 |𝑥) = ∏𝑁
𝑖,𝑗=1 (3) independent additive Gaussian noise and signal dependent
𝑦𝑖,𝑗 !
multiplicative Poisson noise is found [12].
where 𝑖, 𝑗 denotes pixel indices.
Gamma Noise: The speckle noise in ultrasound images
B. CLASSIFICATION OF IMAGE DE-NOISING
occurs due to coherent imaging mechanisms from the TECHNIQUES
scatters [7]. It reduces the image sharpness and creates The image de-noising methods can be grouped into spatial
difficulty for lesion diagnosis. It is modeled by Gamma domain techniques, transform domain techniques, fuzzy
distribution, whose probability distribution function is filtering-based techniques, and machine learning techniques
given by the following equation. [13], [14]. The block diagram illustrating the classification
𝑎𝑏 𝑥 𝑏−1
−𝑎𝑧 of image de-noising techniques is given in Fig.1.
𝑝(𝑥) = { (𝑏−1)! 𝑒 , 𝑥 ≥ 0 (4)
The spatial domain filtering is widely used for image
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where parameters 𝑎 and 𝑏 are positive integers. restoration in which filtering operation is directly applied to
the image pixels. They are further divided into linear and
2
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
non-linear filters. The most common linear filters are the directions. The fuzzy impulse noise detection and reduction
mean filter, Gaussian filter, and Weiner filter. The basic method calculate the gradient in eight directions for noisy
mean filter replaces the particular pixel of operation with pixel detection prior to the filtering [25]. In histogram fuzzy
the mean value from the pre-defined neighborhood. de-noising filters, the membership function is derived from
Similarly, Gaussian filters use a Gaussian kernel with a the input histogram [26]. It consists of the fuzzy detection
particular mean and deviation. They suffer from the phase and cancellation phase. A detailed explanation of
problem of over-smoothening and blurring of edges. To fuzzy-based techniques is given in [27], [14].
overcome this problem, Wiener filter was introduced but it The image de-noising models can be grouped into
is also unsuccessful while operating on sharp edges. Later, analytical models (stochastic and deterministic) and
non-linear filters were introduced in which output is a non- machine learning-based models. In analytical models,
linear function of input for edge, detail, and texture forward de-noising model is explicitly known to the user,
preservation. The primary examples of non-linear filters are and the solution approach is used based on certain chosen
total variation filters, anisotropic diffusion filters, bilateral criteria. The deterministic modeling of spatial filters is
filter, and fourth-order partial differentiation filter. The challenging for each image type. The edge deterioration and
bilateral filter replaces pixel value with neighborhood blurring are common artifacts in spatial and transform
weights which are function of both Euclidian distance and domain techniques. On the other hand, in the machine
range difference [15],[16]. The detailed comprehensive learning models, the inverse model is learned with the help
review of impulse and Gaussian de-noising filters is given of image datasets containing clean and noisy image pairs.
in [14]. The most important question arises: what is the relative
The transform domain techniques convert the image into advantage of machine (deep) learning approach over
the transform domain, and then mathematical operations are analytical methods? In deep learning models, computational
carried out on transform domain coefficients. It is followed burden exists in the learning phase, whereas the testing
by inverse transform to restore de-noised image. These phase consists of a feed-forward model. Whereas analytical
techniques are divided into data-adaptive and non-data methods rely on a computationally demanding optimization
adaptive techniques based on the transform basis function. process and heuristic selection of hyper-parameters is not
The independent component analysis (ICA) and principal the solution for getting good de-noising results. It has been
component analysis (PCA) are data-adaptive transform observed that machine learning models give superior
methods. The ICA is successfully utilized for non-Gaussian performance compared to analytical methods, as feature
de-noising. PCA is a de-correlation method that transforms learning makes a single model apt for considerable
the original image dataset into the PCA domain and selects variation in the noise level.
the most significant principal components (maximum Some de-noisers are based on an analytical optimization,
Eigen-vectors) for image restoration [17]. Wavelet-based which involves iterative process based on some stopping
image de-noising is a multi-resolution image analysis criteria. Although, analytical optimization is involved but it
technique that uses different mother wavelets such as cannot be directly categorized in the machine learning
Daubechies, Haar etc., to obtain wavelet coefficients. It has domain which is basically a numerical optimization
been used to de-noise Gaussian, salt and pepper, and problem. Some of the important analytical optimization
Poisson noise using the appropriate thresholding operator methods are total variation regularization [28] and weighted
[18], [19]. In recent years, the most promising non-local nuclear norm minimization (WNNM) [29]. Variational-
means, collaborative filtering method in the transform based methods find the appropriate priors such as low-rank
domain is block-matching and 3D filtering (BM3D) [20]. In priors, non-local self-similarity priors, sparse priors, low-
this approach, similar 2D image patches are compiled into rank priors, and gradient priors. WNNM assigns the weight
3D groups by the block matching process. The to the singular value of an image and analytical
collaborative Wiener filtering is done in the transform optimization is done based on some energy function.
domain on this 3D group. The improved versions of BM3D In recent years, there is a paradigm shift from analytical
are given in [21], [22]. Curvelet filter is based on theory of models to machine learning models owing to improved
multiscale geometry (i.e., position, scale, and orientation image quality assessment metrics. In the following section,
usage). It gives a better de-noising performance on edges machine learning-based image de-noisers are explained in
and borders than state-of-the-art wavelet de-noising detail. In this paper, the following convention is followed in
methods [23]. It uses ridgelet transform as a primary step, explaining methods: 𝒚 is the noisy input image, 𝒙 is the
and curvelet sub-bands are formed with a filter-bank clean image or ground-truth image, 𝒗 is the noise
structure formed by trous wavelet filters. The 2-D component added 𝒙 to generate 𝒚, and the final predicted
contourlet transform provides spatial and directional de-noised image from de-noiser is 𝒙 ̂.
resolution to keep contours and details intact [24].
The image restoration using fuzzy-based methods considers C. MACHINE LEARNING BASED IMAGE DE-NOISING
the image as a fuzzy set and its pixel values as its member. The machine learning image de-noising techniques have
Fuzzy-based filters use fuzzy rules to design membership made considerable progress with introducing benchmark
functions by calculating the gradient’s degree in various datasets for a particular application, deep learning
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
advancements, and increased computational power with intermediate hidden layers. The general equation of MLP
Graphical Processing Unit's (GPU's). They are further network with two hidden layers is given by
broadly classified into sparsity-based dictionary learning Algorithm 2: De-noising algorithm of multi-layer
models, multi-layer perceptron models, convolutional perceptron-based de-noiser [35]
neural network-based models, and generative adversarial 1. Multi-layer perceptron (MLP) (supervised) learns a
network-based models. function 𝑓(. ): ℝ𝑚 → ℝ𝑜 by training on a dataset,
1) SPARSITY-BASED DICTIONARY LEARNING MODELS where 𝑚 and 𝑜 are the input dimensions and the output
In the sparsity-based techniques, every image patch is dimensions, respectively
constituted as a linear combination of several patches from 2. Given 𝒀 = {𝑦1 , 𝑦2 , 𝑦3 , … , 𝑦𝑚 } and a target 𝑥, it learns a
an overcomplete dictionary 𝑫. The image encoding is done non-linear function approximator for either regression
with the coding vector 𝜶 over a complete dictionary and 𝑙1 - or classification
norm sparse regularizer on coding vector 𝜶, i.e., 3. Features 𝒀 are input to the MLP architecture, which
𝑚𝑖𝑛 ‖𝜶‖
𝜶 1 𝑠. 𝑡. 𝒙 = 𝑫𝜶, following a generalized model given has an input layer, one or more non-linear hidden
by [30], [31]: layers, and an output layer that outputs 𝑓(𝒀)
̂ = 𝑎𝑟𝑔 𝑚𝑖𝑛𝜶‖𝒚 − 𝑫𝜶‖22 + 𝜆‖𝜶‖1
𝜶 (7) 4. Each neuron in the hidden layer transforms the values
Here, 𝜆 is a sparseness-balancing regularization parameter, from the previous layer as 𝑔(𝑤1 𝑦1 + ⋯ + 𝑤𝑚 𝑦𝑚 ),
and ‖𝜶‖1 is 𝜶′𝑠 1-norm. Another design of the model uses where 𝑔(. ): ℝ → ℝ is a non-linear activation function
‖𝜶‖0 (𝜶′𝑠 0-norm) in place of ‖𝜶‖1 . K-Singular Value
like hyperbolic tan function
Decomposition (K-SVD) technique is the pioneering work
5. The output layer transforms the values received from
that uses dictionary learning to frame the sparse
the hidden layer into output values
representation model. The learning of this model can take
place from the benchmark datasets as well as from the input Sum Tanh
image by K-SVD [32]. The K-SVD is the iterative process
in which two consecutive steps take place sparse coding of Weights
Weights
the examples using the current dictionary and updating the
dictionary atoms for optimum data fitting. Some other Sum
works as in [33], [34] follow the same workflow like that
of K-SVD with variation in dictionary and optimization
problem. The clustering-based sparse representation
involves a cost function (double header 𝑙1 optimization
problem) in which both structural structuring and dictionary
learning is used as the regularizer. A typical sparsity-based
image de-noising algorithm is given in Algorithm 1. Output
Algorithm 1: De-noising algorithm of sparsity-based de- Input Layer
noiser [30] Layer
1. Input: 𝒚, where 𝒚 is the image observed in the noisy First Hidden Layer
environment FIGURE 2. Multi-layer Perceptron Network [35]
2. Find 𝒙 ̂ = 𝑫𝜶̂ , where 𝑫 is a sparse dictionary
̂ is the sparsity constraint, and
constructed to suit 𝒙, 𝜶 ̂ = 𝒃𝟑 + 𝒘𝟑 𝑡𝑎𝑛ℎ (𝒃𝟐 + 𝒘𝟐 tanh(𝒃𝟏 + 𝒘𝟏 𝒚))
𝒙 (8)
𝜶 is an unknown parameter, 𝜆 is a sparseness- where 𝒘 is the weight matrix, 𝒃 is vector-valued bias, and
balancing regularization parameter set according to: an activation function is 𝑡𝑎𝑛ℎ, which operates component-
1
𝐿 = ‖𝒚 − 𝑫𝜶‖22 + 𝜆‖𝜶‖0 (‖𝜶‖0 may be replaced wise. The stochastic gradient is used for training with noisy
2
by ‖𝜶‖1 ) and clean image pairs. The parameters of MLP are updated
Such that 𝐿 is as low as possible by back-propagation, minimizing the mean-square error. To
3. Find the estimate of 𝒙 according to the following: increase the training efficiency, data normalization, proper
̂ = 𝑎𝑟𝑔 min‖𝜶‖0 𝑠. 𝑡. ‖𝒚 − 𝑫𝜶‖2 < 𝛬, 𝒙 = 𝑫𝜶
𝜶 ̂, weight initialization, and learning rate division is done. The
𝜶
noisy image is broken into overlapping patches, and each
where 𝛬 is a small-value limiting parameter
patch is de-noised separately. MLP estimates the de-noised
4. Solve the above non-deterministic polynomial
version of the overlapping noisy patches, and then the
problem by using greedy pursuit or convex relaxation
average is calculated for overlapped de-noised patches [35].
5. ̂
Output: 𝒙
There is an improvement in de-noising performance when
2) MULTI-LAYER PERCEPTRON MODELS de-noised patches are weighted by the Gaussian window.
The multi-layer perceptron (MLP) network, as shown in The MLP with four hidden layers uses time-series images
Fig. 2, is the feed-forward model that maps the input image and has shown significant improvement in keeping details
vector (𝒚) with the output image vector (𝒙 ̂) with several and edges intact for SAR images [36]. The trainable non-
linear reaction-diffusion model [37] is a feed-forward
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
architecture that embeds a standard non-linear diffusion noised version is the fundamental cost function. Fig. 4
model in the neural network. The number of layers of MLP illustrates the basic architecture of CNN. Algorithm 3 gives
is less because of vanishing gradient compared to CNN de-noising process.
convolutional neural networks that limit their performance.
A multi-layer perceptron de-noising algorithm is given in Algorithm 3: De-noising algorithm of CNN based de-
Algorithm 2. This algorithm is in accordance with Fig. 3, noiser [38]
which depicts a single hidden layer MLP. 1. Input noisy image 𝒚, noise standard deviation 𝜎 and
clean image 𝒙
+𝑏 i.e. 𝒚 = 𝒙 + 𝒗
2. CNN Module: Input: input image or image plus noise
+𝑏 level maps
𝑦1 Intermediate units: Convolution + Batch Normalization
𝑂𝑢𝑡𝑝𝑢𝑡 + Activation function
𝐹𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝒀) 𝑎1 Output unit: Convolution+ Residual learning
𝑓(𝒀) 3. Intermediate output is feature maps given by:
𝑦2 𝑭𝑴𝑗𝑖 = 𝐴(∑𝑖𝜖𝑀𝑗 𝑭𝑴𝑙−1 ∗ 𝒘𝑗𝑖𝑙 + 𝒃𝑗𝑙 ), where 𝑭𝑴𝑙−1
𝑖 𝑖
𝑎2 represents the feature map of 𝑙 − 1 layer, 𝒘𝑗𝑖𝑙 and 𝒃𝑗𝑙 are
𝑦3 weight and bias of 𝑙 layer, 𝐴 is activation function, 𝑀𝑗
is selection operator of feature maps.
4. Residual learning implies 𝒙 ̂ = 𝒚 − 𝑅(𝒚) , where 𝑅
FIGURE 3: One hidden-layer MLP
represents residual learning CNN operator.
5. 1 2
3) CNN-BASED DE-NOISING MODELS Loss function: 𝑙(Ɵ) = ∑𝑁 ‖𝑅(𝒚𝑖, Ɵ) − (𝒚𝑖 − 𝒙𝑖 )‖
2𝑁 𝑖=1
In recent years, the convolutional neural network (CNN) , where Ɵ denotes CNN parameters, 𝑁 are the number
based models have shown significant improvement in of images in the training dataset, 𝒚 and 𝒙 represent a
various image quality metrics compared to other state-of- noisy and clean image, 𝑅 is residual learning.
the-art methods [39]. The success of CNN models can be 6. If 𝑙(Ɵ) ≅ 0, model is trained, else retrain for the next
attributed to large modeling capacity and significant epoch.
advancement in network training and design. The CNN is
designed for grid or matrix kind of data as input taking
inspiration from the visual cortexes of animals. In CNN Input O/p Image
models, the convolutional kernel with learnable parameters Image 256 64 FM
256 x 256
64 FM
x 256 (RGB)
is shared across all image positions. The convolutional (RGB) 256 × 256 x
kernel can be visualized as a feature extractor for a 256 256
particular image restoration application. The convolutional
layers have a cascade connection, so extracted features
become more complex, hierarchically, and progressively. 3 Conv filters
64 Conv 64 Conv
CNN model consists of an input layer, series of filters3x3x3 filters 3x3x64
intermediate hidden layers, and the output layer. The Bias 3x3x64 Bias
convolutional kernel with learnable weights is applied on (1 x 1 x 64) Bias (1 x 1 x 3)
each layer, followed by some activation function. The (1 x 1 x 64)
output of each layer is fed as the input of the next one. The
output of intermediate layers is termed as feature maps. The
general equation of the intermediate feature maps (𝐹𝑀) of Desired Output
𝑙 𝑡ℎ layer of CNN is given by [40] Image
256 x 256 (RGB)
𝑭𝑴𝑗 = 𝐴(∑𝑖𝜖𝑆𝑗 𝑭𝑴𝑙−1 𝑖 ∗ 𝒘𝑗𝑖𝑙 + 𝒃𝑗𝑙 ) (9) Obj Func.
c(.) = || . ||22
where 𝑺𝑗 represents selection of the input feature
map, 𝑭𝑴𝑙−1 𝑖 is the previous feature map, 𝒘𝑗𝑖𝑙 is the weight FIGURE 4: CNN Architecture for image restoration [39]
of the convolution kernel of the 𝑙 𝑡ℎ layer, 𝐴 is the activation
function which can be a rectified linear unit, sigmoid
function etc. and 𝒃𝑗𝑙 is the bias in the 𝑙 𝑡ℎ layer. The training 3) GAN-BASED DE-NOISING MODELS
procedure involves optimizing parameters such as kernels The generative adversarial network (GAN) uses generative
by using clean and noisy image labels with stochastic modeling with two sub-models, termed generator and
gradient descent, Adam’s algorithm, etc. The Cost function discriminator [41]. This network is designed to overcome
optimization takes place during the training process. The deep generative model difficulty of learning complex
mean square error between the clean image and its de- probabilistic distributions. The generator model is used for
extracting new plausible images from the problem domain,
5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
FIGURE 7: Block diagram depicting the architecture of DnCNN, IDCNN, SCNN and FFDNet
ReLU activation with batch normalization. The concluding the computational cost and enhances the extraction of more
layer is a convolutional layer which uses same number of context information. Batch Normalization Residual
filters as the number of image channels of size 3x3x64. Network (BRDNet) also uses residual learning, batch re-
The model gives same de-noising results with both normalization, and dilated convolution to address the
stochastic gradient descent and Adam’s algorithm by problem of internal co-variate shift for extraction of more
optimizing the following loss function: features [51]. Deep iterative down-up CNN (DIDN),
1 2 densely connected hierarchial denoising network (DHDN)
𝑙(Ɵ) = ∑𝑁 ‖𝑅(𝒚𝑖, Ɵ) − (𝒚𝑖 − 𝒙𝑖 )‖ (13)
2𝑁 𝑖=1 [52], and multi-level wavelet CNN (MWCNN) [53] are
where Ɵ denotes DnCNN parameters, 𝑁 are the number of
based on UNet [54] architecture which was designed for the
images in training dataset, 𝒚 and 𝒙 represent a noisy and
semantic segmentation. The deep iterative down-up CNN
clean image, 𝑅 is residual learning. There are other de- (DIDN) [55] is also based on receptive field size variation
noisers variants whose basic architecture resembles with for improving de-noising results. It consists of four stages:
DnCNN network. Wavelet de-noising CNN i.e., WDnCNN initial feature extraction, down-up block, reconstruction,
[48] uses residual learning in the novel feature space of the and enhancement. The initial feature maps are extracted by
wavelet domain. In this method, the network is trained convolution followed by iterative up and down sampling of
with four decomposed wavelet sub-bands, and the feature maps by down-up block. The outputs of all the
architecture is the same as that of DnCNN. SCNN [49] is down-up blocks are fed into the reconstruction block, which
residual learning-based model which uses soft shrinkage has convolutional and parametric rectified linear units. The
activation function for varying noise levels of the input concatenated output of the reconstruction block is fed into
image. an enhancement block with a convolution unit. DHDN
IDCNN [40] is another deep convolutional neural network network uses modified Unet architecture to learn large
that follows the same residual learning architecture as that number of parameters, solves vanishing gradient by
of DnCNN without incorporating batch normalization. This residual learning and dense connectivity to convolution
network fails to converge with stochastic gradient descent layers. In MWCNN, multiwavelet transform is integrated
because of the gradient explosion. So, this network clips the into UNet architecture to increase the receptive field size by
gradient in the specific pre-defined interval, i.e., gradient reducing the resolution of feature maps.
clipping procedure. It has been observed that network The fast and flexible de-noising convolutional neural
performance improves as the depth of the network increases network (FFDNet) is the fastest in terms of implementation
from four to ten. In this model, a non-fixed noise mask is time, and it can handle spatially variant Gaussian noise
used during the process so that a single model can be used [56]. The unique feature of this model is that unlike other
for different noise levels. The loss function of IDCNN is networks, the mapping function contains a noise level map
given by in the input. The noise level map plays a crucial role in
𝑙(Ɵ) = ‖𝒙 − 𝒙 ̂‖2 (14), where 𝒙 and 𝒙 ̂ denote clean and keeping the trade-off between noise reduction and detail
estimated images respectively. preservation. Conventionally, the mapping function learns
The ECNDNet [50] is a residual learning model which de-noised images from noisy images, CNN parameters, and
follows the loss function given in equation (13). The Gaussian noise standard deviation. In FFDNet, the CNN
architecture is the same as that of DnCNN. The main parameters are not affected with variation in Gaussian noise
feature of the ECNDNet network is the usage of dilated level. It works on downsampled sub-images, which tend to
convolution to increase the receptive field size. It reduces increase the receptive field. The architecture of FFDNet has
the same units as that of DnCNN, i.e., convolutional
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
Input Output
Noisy Image
SB
Image
FEB AB
operator in the first layer, repeated units of convolution, multiple de-noisers modules interleaved with back-
batch normalization, and ReLU activation, concluded with projection (BP) that ensure the observation consistencies.
the convolutional layer. The Adam’s algorithm [57] is used DRUNet [63] is the improved version of IRCNN and its
for training to minimize the following loss function. methodology involves usage of CNN as deep denoiser prior
1 2
𝑙(Ɵ) = ∑𝑁 ‖𝐹(𝒚𝑖, 𝑴𝑖, Ɵ) − (𝒙𝑖 )‖ (15) accompanied by half quadratic splitting based iterative
2𝑁 𝑖=1 algorithm for solving deblurring, super-resolution,
where 𝐹 denotes FFDNet learning function and 𝑴 is noise
denoising and color image demosacking.
level map.
Recently, the attention-guided de-noising convolutional
Some models like NN3D [58] and graph CNN [59] which
neural network (ADNet) [64] has outperformed all previous
exploits non-local and local similarities through non-local
CNN’s. They are specifically designed to overcome the
filter and graphical signal processing. The NN3D uses
disadvantage of increment in network length. As the length
standard pre-trained CNN in cascade connection with
of the network increases, the influence of shallow layers
standard non-local filter. The DnCNN, IDCNN, and
becomes weak in de-noising performance. It is divided into
FFDNet focus towards local features with biased receptive
four major modules; sparse block (SB), feature
field. The NN3D integrates non-local features in a single
enhancement block (FEB), attention block (AB) and
modular framework to further improve de-noising
reconstruction block (RB). The SB reduces the depth and
performance. Similarly, graph CNN also exploits the non-
improves the efficiency of the network with the usage of
local similarities by incorporation a graphical convolutional
convolution and dilated convolution operator. It is twelve
layer. The graph CNN layer works on feature maps to
layer block with dilated Conv + BN + ReLU (second, fifth,
aggregate similar spatially adjacent and spatially distant
ninth, and twelfth layer) and Conv + BN + ReLU in the rest
pixels. The averaging of local and non-local pixels is done
of the layers. The next (13th to 16th) layers form FEB to
to produce the desired feature map. Universal Denoising
create robust features by merging global and local features.
network (UNet and UNLNet) is another network that
The first three layer of FEB is Conv + BN + ReLU, and the
integrates convolution and non-local filtering layers for
fourth layer is Conv. The output of the Conv layer and
both gray and color image denoising [60]. Fig 7 depicts the
input noisy is concatenated to improve the representation
block diagram of the combined architecture of DnCNN,
capability further. It is followed by the usage of tanh
IDCNN, SCNN, and FFDNet.
activation for the non-linearity. The AB consists of just one
The models, namely PDNN[61], IRCNN [62] and DRUNet
Conv layer, which compresses the features into the weights
[63] integrate the observational model with deep CNN’s
to modify the previous layer output. RB is the final stage,
discriminative learning. The model-based methods require
which incorporates subtractor for the residual learning
several iterative steps to solve the optimization problem,
process. The architecture of ADNet is given in Fig. 8. The
but they are utilized to solve different image restoration
fully convolutional encoder-decoder structure with skip
tasks like de-blurring, super-resolution, and de-noising with
connections is also used for Gaussian and speckle noise
the single model with the help of an image degradation
removal [65].
matrix. They utilize the powerful de-noising capabilities of
The PReLU (parameteric rectified linear units) and edge
CNN and prior of the observational models in a single
aware based CNN de-noiser is one of the latest works that
modular framework. In [61], [62], model based
has produced good PSNR results for both BSD-68 and Set-
optimization is merged with robust image priors with
12 compared to other networks [66]. It is improvised
variable splitting technique. The variable splitting reduces
DnCNN network with PReLU as an activation function
the number of CNN parameters and enhances the CNN
which learns the slope in negative direction as well. The
training efficiency. The observation model is unfolded into
inclusion on principal component analysis on the feature
discriminating CNN learning, which is composed of
9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
TABLE I
ADVANTAGES AND DISADVANTAGES OF DIFFERENT MACHINE-LEARNING MODELS
Machine- Advantages Disadvantages
learning model
Dictionary- • Dictionary-learning models are powerful tools for • Most dictionary-learning methods consider an
learning model many image restoration, de-noising and over-complete dictionary and formulate the
recognition tasks learning process as a minimization problem
• Most local image patches can be well • Minimization problems are very challenging and
approximated by a sparse linear combination of mostly non-convex
basis atoms • Minimization in such a case is usually greedy and
• Constructing dictionaries adaptive to the input computational demanding
image via some learning process helps achieve • If correlations among dictionary atoms are not well
better sparsity that fixed orthogonal dictionaries constrained, the redundancy of the dictionary does
like DCT, and wavelets not necessarily improve the performance of sparse
• Dictionary-based learning models help preserve coding
minute details and texture of images that undergo
noise addition
Multi-layer • Capability to learn non-linear models • MLP with hidden layers have a non-convex loss
perceptron • Capability to learn models in real-time (on-line function where there exists more than one local
model learning) minimum. Therefore different random weight
initializations can lead to different validation
accuracy
• MLP requires tuning a number of hyper-parameters
such as the number of hidden neurons, layers, and
iterations
• MLP is sensitive to feature scaling
CNN • Less number of parameters as compared to fully • Analytical approaches have advantage over CNN
connected neural networks in merging prior into the inverse problem solution
• Apt for both known and blind Gaussian de-noising • Non-avaibaility of image databases for medical
• Concept of transfer learning i.e. the weights image de-noising and classification
learned by CNN can be used by other network. • Difficulty in case of unsupervised learning in real
• Many methodologies are being designed owning world scenario.
to simple architecture and mathematical modelling
.
GAN • Unsupervised learning method; can be trained • Fail to model a multimodal probability distribution
using unlabelled image data as they learn internal of data; suffer from mode collapse. Sometimes,
representations of data suffer from complete collapse (generated samples
• Generate data similar to real image data; can are virtually identical)
generate images indistinguishable from the real • Suffer from the problem of vanishing gradients;
data training of the initial layers in the net is either
• Learn complex distributions of image data extremely slow or effectively stops
• Discriminator is a classifier that can classify • Internal covariance shift is induced by a change in
objects the input distribution; this slows down the training
• Training of GANs can be comparatively slow
owing to the above reasons
maps in sixteenth layer has led to the extraction of more normalization. It has eight convolutional layers and two
features. The final step is cascading the network with an fully connected layers, which assign a probability to
adaptive bilateral edge aware filter to further refine the edge generated images and ground-truth images. The value
and texture details. function of de-noising GAN network is given by
min 𝑚𝑎𝑥
𝑉(𝐷, 𝐺) = 𝐸𝒙~𝑝𝑑𝑎𝑡𝑎(𝒙) [𝐷(𝒙)] − 𝐸𝒚~𝑝𝒚(𝒚) [𝐷(𝒚)]
C. METHODOLOGIES OF GAN-BASED MODELS 𝐺 𝐷∈𝐷
(GAUSSIAN NOISE) (16)
The GAN network given [67] in uses DenseNet CNN as the where 𝐷 is the set of 1-Lipschitz functions. The objective is
generator network to ease up the vanishing-gradient to make an approximation of 𝐾. 𝑊(𝑝𝑑𝑎𝑡𝑎(𝑥), 𝑝𝑦 (𝑦)), in
problem and Wasserstein-GAN as the loss function. The which 𝐾 is a Lipschitz constant and 𝑊 is a Wasserstein
generator network outputs an estimated ground truth image distance. The gradient penalty term is added so that the
from the noisy image, whereas the discriminator eliminates gradient of the discriminator network does not exceed 𝐾,
the difference between the generator output and the ground- and is given by
truth image. The generator network follows the architecture 𝜆𝐸𝒚~𝑝𝒚(𝒚) [(‖𝛻𝒚 𝐷(𝒚) − 1)2 ]‖ (17)
2
of DenseNet with eight Dense Blocks, along with input, Loss function is the combination of content loss and
output, and bottleneck convolution block. The generator adversarial loss given as
extracts both low-level and high-level features efficiently. 𝑙 = 𝜆𝑙𝐺𝐴𝑁 + 𝑙𝑐𝑜𝑛𝑡𝑒𝑛𝑡 (18)
The discriminator network uses leaky ReLU as the where, content loss is given by 𝑙1 or 𝑙2 norm, and
activation function and layer normalization instead of batch
10
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
adversarial loss is given by Wasserstein-GAN critic selected from the RVIN-corrupted test image and feature
function. vectors that indicate whether the centre pixel is
The GAN-CNN based blind de-noiser (GCBD) model [42] contaminated or extracted by the predictor. These feature
extracts noise blocks from the clean images. The GAN vectors are composed of numerous statistics, viz., the
produces a noisy block instead of the de-noised image. The multiple rank-ordered absolute differences (ROADs), the
noisy blocks extracted from the GAN are used for the clean pixel median deviation (CPMD), and the edge pixel
creation of a training dataset for CNN. The GCBD model is difference (EPD). They are rapidly mapped to noisy/clean
a cascade connection of GAN followed by CNN. The (1 for noisy, 0 for clean) labels by the pre-trained noise
generated blocks by GAN along with extracted noise blocks detector. According to the ratio of the obtained noisy labels
are used for training the discriminative learning based to the total number of selected patches, the predictor
CNN. The GCBD can be used when there is an absence of provides the noise ratio of the whole image. From the
paired data for the supervised training of CNN. It gives output of the NRP, i.e., the predicted noise ratio, the most
promising results for Gaussian noise, mixed noise, and real- appropriate DnCNN specifically trained for this noise ratio
world noisy images. The limitation is that noise is taken is exploited for de-noising. Under the guidance of the NRP,
only as additive white noise with zero-mean. the proposed method has the ability to handle unknown
noise ratios. This method performs well in terms of
III. MACHINE LEARNING BASED IMPULSE DE-NOISERS execution efficiency and image restoration. Turkmen [72]
has proposed an artificial neural network for de-noising
A. METHODOLOGIES OF DICTIONARY LEARNING RVIN-incorporated images by detecting the noisy pixels.
MODELS (IMPULSE NOISE) The statistics used to detect the RVIN noisy centres are
Wang et al. have proposed an adaptive dictionary-learning- rank-ordered absolute differences (ROADs), and rank-
based method to preserve image structure in impulse- ordered logarithmic difference (ROLD) values. These are
contaminated images with the help of a robust 𝑙1 -norm the inputs to the ANN for the detection process. After the
data-fidelity term to help impulse noise cancellation [68]. In detection process is completed, the corrupted pixels are
this algorithm, the restoration problem is mathematically restored by the edge-preserving regularization (EPR)
formulated into an 𝑙1 − 𝑙1 minimization objective and method, allowing edges and noise-free pixels to be
solved under the augmented Lagrangian framework through preserved. This mechanism works well in the presence of
a two-level nested iterative procedure. The algorithm has high-density RVIN.
high image restoration power to produce restored images Li et al. [73] have improvised the usage of densely
with a high PSNR value. Guo et al. [69] have introduced a connected convolutional networks (DenseNet) to de-noise
novel algorithm to enhance image sparsity to help remove images corrupted by impulse noise with the help of CNN
salt and pepper noise removal with a fast multiclass to learn pixel-distribution features from noisy images. The
dictionary learning, and then both the sparsity proposed method, viz., a densely connected network for
regularization and robust data fidelity are formulated as impulse noise removal (DNINR), captures the pixel-level
minimizations of 𝑙0 − 𝑙0 norm for impulse noise removal. distribution information using wide and transformed
Additionally, a numerical algorithm of modified alternating network learning. This mechanism shows significantly
direction minimization is derived to solve the proposed de- better results in terms of edge preservation and noise
noising model. This algorithm excels in image detail suppression.
preservation. Deka et al. in [70] have proposed a novel two- Khaw et al. [74] have used an efficient CNN with particle
stage de-noising method for removing random-valued swarm optimization (PSO) for high-density impulse noise
impulse noise from an image. In the first stage, an impulse removal. This high-density impulse noise detection and
noise detection scheme is used to detect the pixels which removal model mainly consists of two parts: impulse noise
are likely to be corrupted by the impulse noise, viz., noise removal and impulse noisy pixel detection for restoration.
candidates. In the second stage, the noise candidates are The deep CNN architecture facilitates the de-noising
reconstructed by the image impainting method based on procedure to filter out noise from the noisy images. The
sparse representation in an iterative manner until PSO algorithm optimizes the threshold values for detecting
convergence is achieved. This algorithm works well in impulse noisy pixels. The method is robust and works well
terms of both visual and quantitative aspects. on both gray and colour images in terms of both qualitative
and quantitative aspects.
B. METHODOLOGIES OF CNN-BASED MODELS The RVIN can also be removed with the combination of
(IMPULSE NOISE) classifier and regression CNN [75]. Classifier network
Chen et al. have proposed a blind CNN architecture for separates noisy and noise-free pixels. Thereafter, the
random-value impulse noise (RVIN) removal [71]. This regression network uses noise-free pixels along with the
improvised de-noising mechanism for RVIN suppression original noisy input image to predict the output image.
works on the principle of flexible noise ratio prediction, Batch Normalization is embedded in both classifier and
which proved to be better than DnCNN-based RVIN regression network to accelerate the de-noising
suppression by eliminating unnecessary dependence on the performance.
exact perception of the noise ratio. Random patches are
11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
PLs
1. DnCNN
FVs 2. EPR
Noise Remover 3. DenseNet
Classifier Statistics N/Ws (Post- 4. Regression
Network 1. ROADS detection) network
2. CPMD
3. EPD
4. ROLD
Fig.9 shows the overall architecture of the Impulse-noise Multi-directional Long-Short Term Memory Networks to
removal model. The first step is the extraction of a random de-noise images of Poisson noise. CNN layers are used to
patch from the noise-corrupted image, and then a classifier extract image features and to estimate noise bases in the
N/W is used to predict noisy labels. Thus, the noise- images, and the multi-directional LSTM layers are
contamination determiner determines the predicted labels efficiently used to memorise the statistics of residual noise
(PLs) in case of Jin et al., and extracts feature vectors (FVs) components, which possess long-range correlations and are
in case of Chen and Turkmen. These feature vectors are sparse in the spatial domain. The Blahut-Arimoto algorithm
composed of numerous statistics, viz., ROADs, CPMD, is used to numerically derive a distortion-mutual
EPD, ROLD, etc. Finally, a de-noiser N/W is used to de- information function for the image de-noising algorithm.
noise the contaminated image based of the identified noisy The algorithm shows state-of-the-art performance in terms
centres. of objective and subjective qualities. Su at al. [78] have
IV. MACHINE LEARNING BASED POISSON DE- proposed a novel method to tackle the problems caused due
NOISERS to Poisson noise in the low-light imaging field. This
Poisson noise is a special type of noise that is not additive proposal is that of a deep multi-scale cross-path
in nature. Unlike the Gaussian noise, the noise power is concatenation residual network (MC2RNet) which
measured by the peak value as its strength is dependent on incorporates cross-path concatenation modules for de-
the image intensity. It is natural to define the noise power in noising. MC2RNet learns the remnant residue between the
an image by the maximal value in the image, i.e., its peak noisy and the latent clean image to facilitate the model
value. Thus, Poisson de-noisers are described in terms of training procedure. This method opts for blind Poisson
the peak value as the strength of the noise power. training over discriminative de-noising algorithms to train a
single model for handling Poisson noise with different
A. METHODOLOGIES OF DICTIONARY LEARNING levels. The algorithm shows a better performance in terms
MODELS (POISSON NOISE) of peak signal-to-noise ratio and visual effects. Ramez et al.
Giryes et al. [76] have proposed a novel method to apply [79] have proposed a flexible and data-driven method to de-
the sparse-representation technique to image patches noise Poisson-corrupted images, which reduces the heavy
extracted, adopting the same exponential idea. The ad hoc engineering load occurring due to computational
proposed algorithm uses greedy pursuit with boot-strapping post-processing in the contemporary de-noising procedures.
based stopping condition and dictionary learning within the They have used a powerful framework of deep CNNs and a
de-noising process. The stopping criterion is novel in its training mechanism that trains the same network with
nature. The paper effectively migrates from the Gaussian images having a specific peak value. Thus, by using a
Mixture model (GMM) to a dictionary-learning based supervised approach and the representation capabilities of
model by resolving the difficulties involved in the deep CNNs, and using a specific class of images for
conversion process. The reconstruction performance of the training, the authors have presented a comparatively simple
proposed scheme is competitive with leading methods in method that shows state-of-the-art performance both
high SNR, and achieving state-of-the-art results in cases of qualitatively and quantitatively and is an order of
low SNR. magnitude faster than other methods. Ramez et al. [80]
have introduced a methodology that exploits the
B. METHODOLOGIES OF CNN-BASED MODELS architecture of a fully convolutional CNN that uses shallow
(POISSON NOISE) layers to handle local noise statistics and deeper layers to
Kumwilaisak et al. [77] have proposed a method recover edges and enhance textures. The de-noiser is made
(CNN+LSTM) based on Deep Convolutional Neural and
12
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
where 𝑑𝑚𝑖𝑛 and 𝑑𝑚𝑎𝑥 are minimum and maximum value in The CNN-based transfer learning, four-stage convolutional
the entire image dynamic range with probability of 𝑝/2. filtering model is mixed noise de-noiser designed for a
The AWGN noise 𝑣 is added with the probability 1 − 𝑝. mixture of Gaussian and impulse noise [82]. It uses a rank
Similarly, the expression of noisy image pixel 𝑦 obtained order filter in the preprocessing step, which is Cai’s filter in
by the corruption with the SPIN, RVIN and Gaussian noise case of Gaussian and SPIN, whereas the combination of
is given by: adaptive median filter and adaptive center weighted median
𝑑𝑚𝑖𝑛 with probablity 𝑝/2 filter is used in the case of Gaussian, SPIN, and RVIN. The
𝑑𝑚𝑎𝑥 with probablity 𝑝/2 bilinear interpolation is performed on rank order filter
𝑦={ } (20)
𝑑 with probablity 𝑟(1 − 𝑝) output to get a slightly smoother version of the noisy image.
𝑥 + 𝑣 with probablity (1 − 𝑟)(1 − 𝑝) The purpose of bilinear interpolation is to suppress high-
where 𝑑 is random pixel value with the probability 𝑟(1 − frequency components that occurred due to rank order
𝑝).
TABLE II
COMPARISION OF PSNR (IN dB) VALUE OF DIFFERENT MACHINE LEARNING MODELS ON SET-12 DATASET
Images C.Man Hous Pepper Starfis Monar Airpla Parrot Lena Barbara Boat Man Couple Average
e s h ch ne
Gaussian noise level σ=15
K-SVD [32] - 34.30 - - - - 30.97 - - - 30.46 - -
K-LLD [43] - 33.81 - - - - 30.91 - - - 30.65 - -
CSF [47] 31.95 34.39 32.85 31.55 32.33 31.33 31.37 34.06 31.92 32.01 32.08 31.98 32.32
TNRD [37] 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN [38] 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
IDCNN1[40] 32.54 34.87 33.24 - 35.49 32.79 - 33.75 33.15 31.81 - - -
IDCNN2 [40] 32.24 34.83 33.11 - 35.38 32.68 - 33.70 32.98 31.73 - - -
FFDNet [56] 32.42 35.01 33.10 32.02 32.77 31.58 31.77 34.63 32.50 32.35 32.40 32.45 32.75
PDNN[61] 32.44 35.40 33.19 32.08 33.33 31.78 31.48 34.80 32.84 32.55 32.53 32.51 32.91
GraphCNN
[59]
32.58 35.13 33.27 32.42 33.25 31.84 31.89 34.57 32.84 32.41 32.42 32.40 32.917
ECNDNet[50] 32.56 34.97 33.25 32.17 33.11 31.70 31.82 34.52 32.41 32.37 32.39 32.39 32.81
IRCNN [62] 32.55 34.89 33.31 32.02 32.82 31.70 31.84 34.53 32.43 32.34 32.40 32.40 32.77
BRDNet[51] 32.80 35.27 33.47 32.24 33.35 31.85 32.00 34.75 32.93 32.55 32.50 32.62 33.03
ADNet [64] 32.81 35.22 33.49 32.17 33.17 31.86 31.96 34.71 32.80 32.57 32.47 32.58 32.98
ADNet-B [64] 31.98 35.12 33.34 32.01 33.01 31.63 31.74 34.62 32.55 32.48 32.34 32.43 32.77
PReLU [66] 33.18 35.59 33.54 33.17 34.20 32.65 32.73 35.21 32.25 32.90 32.76 32.78 33.41
MWCNN [53] - - - - - - - - - - -- - 33.20
Gaussian noise level σ=25
K-SVD [32] - 32.12 - - - - 28.12 - - - 27.59 - -
13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
14
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
MC2RNet6−B
24.13 26.78 24.84 24.84 21.81 30.94 24.96 24.29 24.76 25.10
[78]
MC2RNet6−S
24.24 27.18 25.03 24.95 21.91 31.29 24.97 25.08 24.93 25.25
[78]
DenoiseNet [79] 25.73 28.42 26.35 26.10 22.91 32.28 26.45 26.23 26.11 26.26
MC2RNet6−B
26.61 28.56 26.39 26.06 22.95 33.28 26.66 26.24 26.26 26.34
[78] 8
MC2RNet6−S
26.71 28.75 26.45 26.24 23.01 33.71 26.68 26.46 26.41 26.43
[78]
DenoiseNet [79] 28.94 31.67 29.21 28.74 25.42 36.20 29.77 29.06 29.13 28.71
MC2RNet6−B
30.57 31.52 29.04 28.71 25.37 36.40 29.71 28.99 29.16 28.77
[78] 30
MC2RNet6−S
30.95 31.90 29.23 28.83 25.48 36.99 29.84 29.21 29.34 28.93
[78]
TABLE VIII
AVERAGE PSNR(in dB) VALUE ON BSD-68 DATASET FOR POISSON NOISE
Method Peak=1 Peak=2 Peak=4 Peak=8
IRCNN [62] [80] 21.66 22.86 24.00 25.27
DenoiseNet [79] 21.79 22.90 23.99 25.30
MC2RNet6−B [78] 21.92 23.00 24.13 25.39
MC2RNet6−S [78] 22.00 23.08 24.25 25.51
TABLE IX
AVERAGE PSNR (in dB) COMPARISION OF MIXED NOISE (RVIN+AWGN) ON DIFFERENT NATURAL IMAGES
Noise
Image Lena Barbara Bridge Boat Airplane Pepper Hill
level
Mixed CNN [82] σ=15, 32.28 25.67 28.86 29.12 32.52 34.07 31.61
[83] ρ=0.15 32.56 29.43 29.16 30.30 32.87 33.49 31.73
Mixed CNN [82] σ=15, 29.10 24.17 26.54 27.02 28.81 29.90 29.04
[83] ρ=0.30 31.71 28.32 28.09 29.19 31.88 33.66 30.98
Mixed CNN [82] σ=15, 24.87 21.67 22.86 23.62 23.32 24.54 24.69
[83] ρ=0.45 30.36 26.25 26.53 27.60 30.44 32.09 29.86
σ=25,
Mixed CNN [82] 29.87 24.52 26.61 27.61 30.34 31.65 29.48
ρ=0.15
[83] 30.46 27.28 26.72 28.31 30.73 31.76 29.55
σ=25,
Mixed CNN [82] 27.93 23.36 25.10 26.08 28.00 28.70 27.80
ρ=0.30
[83] 29.79 26.35 25.97 27.39 29.90 31.30 28.99
σ=25,
Mixed CNN [82] 24.88 21.65 22.62 23.45 23.97 24.87 24.59
ρ=0.45
[83] 28.54 24.75 24.81 26.10 28.40 30.04 28.01
TABLE X
DE-NOISING RESULTS FOR MIXED GAUSSIAN NOISE ON BARBARA IMAGE (PSNR=19.02 dB)
Method PSNR (dB) Method PSNR (dB)
K-SVD (known parameters)
WDL (case 1) [84] 32.39 27.66
[32]
W-KSVD (unknown TYPE-II (𝑟1 ∶ 𝑟2 = 0.7 ∶
WDL (case 2) [84] TYPE-I 30.07 29.35
parameters) [84] 0.3, 𝜎1 = 10, 𝜎2 = 50)
K-SVD [85] 26.95 IRCNN [62] 28.95
EM-CNN [86] 30.68
TABLE XI
RESULTS FOR BLIND DE-NOISING
Noise Type Gaussian noise
Mode Non-blind Blind
Method BM3D [20] DnCNN-B [38] GCBD[42]
𝜎 = 15 31.07 31.61 31.59
𝜎 = 25 28.57 29.16 29.15
Noise Type Mixed noise
Mode Non-blind Blind
Method BM3D [20] DnCNN-B[38] GCBD[42]
𝑠 = 15 41.08 40.75 42.00
𝑠 = 25 37.85 37.54 39.87
filtering on the Gaussian noise. It is followed by the four- propagation algorithm. The other CNN model for mixed
stage convolutional filtering. The first stage consists of the Gaussian and impulse noise involves two parts: the first
the conv layer and ReLU activation function followed by half for impulse noise removal and the second half for the
the max-pooling layer. The second and third stages consist Gaussian noise removal [83]. It consists of the input layer,
of the conv layer and ReLU activation function. The fourth intermediate layers of convolution, batch normalization,
stage is the conv layer. The squared Frobenius norm is used and leaky ReLU followed by the convolutional output
as the loss function, and training is done by the back layer. The second part of Gaussian noise removal has a skip
16
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
connection for residual learning. CNN model given in ,has more attention to noise regions, which contributes in
conv+ReLu+BN as basic building block and shows best balancing between noise removal and texture preservation.
structural metrics results for both known and unknown Extensive experiments show that this method performs well
noise level of mixed Gaussian-Impulse noise [88]. The both qualitatively and quantitatively. Chen et al. have
CNN is used as a regularizar in traditional variational based proposed a Deep Boosting Framework (DBF) [93] for real-
methods for mixed noise removal [86]. The mixed noise world image denoising by combining the deep learning into
parameters are iteratively estimated by variational method the boosting algorithm. The DBF replaces conventional
followed by noise classification according to the statistical boosting units by elaborate convolutional neural networks.
parameters. The methodology is implemented by The outcome is a lightweight Dense Dilated Fusion
optimization of sub-problem involving four steps which are Network (DDFN) as the boosting unit, which addresses the
regularization, synthesis, parameters estimation and noise vanishing gradient problem during training due to the
classification. cascading of networks while promoting the efficiency of
VI. REAL WORLD-DENOISERS limited parameters. This method reduces the domain-shift
Xu et al. have constructed a benchmark dataset to de-noise issue with the one-shot domain transfer scheme. This is a
real-world images [89]. The authors have used different strong technique in terms of real-world de-noising. Real-
cameras with different camera settings. They have world de-noising has been tested and evaluated on different
evaluated different de-noising methods on the new datasets like DND and NIGHT. DND is a novel benchmark
proposed dataset as well as previous datasets for a proper dataset which consists of realistic photos from 50 scenes
comparison and subsequent analysis. Extensive taken by 4 consumer cameras. The NIGHT dataset is
experimental results demonstrate that the methods designed divided into 20 images (denoted as NIGHT-A) and the
specifically for realistic noise removal based on sparse or other 5 images (denoted as NIGHT-B). Another dataset
low rank theories, achieve good de-noising performance used is RID. It has 20 representative scenes, which are
and are robust. Another observation made by the authors captured under different shooting conditions. The problems
suggests that the proposed dataset is more challenging for faced in real-world de-noising are as follows: (1) The noise
the state-of-the-art methods. In Kim et al. [90], a grouped in real-world noisy images is very complex, which cannot
residual dense network (GRDN) is proposed, which is an be described by simpler distributions like Gaussian or
extended and generalized architecture of the state-of-the-art Poisson. (2) The inherent practicality of real-world noisy
residual dense network (RDN) [91]. The core part of RDN images makes the de-noising more difficult than the
is the grouped residual dense block (GRDB) and used as a synthetic case. (3) the noise distribution may change along
building module of GRDN. Cascading GRDNs aids the de- with the in-camera imaging pipeline [94]. It thus makes the
noising performance significantly. Inspired by the GAN noise distribution in a captured RGB image different from
modeling technique, the authors have made their own its Gaussian assumption in the RAW space. (4) The
generator and discriminator for real-world noise modeling. problem of domain shift cannot be neglected in the practical
Lin et al. [92] have constructed a new dataset to solve the scenario. It can neither be neglected between the synthetic
problem of low availability of proper datasets and obtained and the real-world noise, but the characteristics of real-
the corresponding ground truth by averaging, and then they world noise can also exhibit differences pertaining to
extended them through noise domain adaptation. different camera settings (viz., sensor or aperture size),
Furthermore, they went on to propose an attentive shooting conditions (viz., light, environment, and
generative network by injecting visual attention into the temperature), and imaging pipelines (viz., smartphone and
generative network. During the training, the visual attention
map learns noise regions. The generative network pays
(a) Original Image (b) Noisy Image (𝜎 = 25) (c) DnCNN (PSNR 30.28 dB) [38]
17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
(d) FFDNet (PSNR 30.08 dB) [56] (e)IRCNN (PSNR 30.09 dB) [62] (f) ECNDNet(PSNR 30.30 dB) [50]
Real-World De-noising
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
W-KSVD (unknown
IRCNN [62] K-SVD (known parameters)
Noisy (PSNR=19.02 dB) parameters)
(PSNR=28.95 dB) [62] (PSNR=27.66 dB) [32]
(PSNR=29.35 dB) [84]
EM-CNN [86]
(PSNR=30.68 dB)
FIGURE 13: De-noising results for mixed noise
Blind De-noising
Uniform noise (10%, [−𝑠, 𝑠]) + Gaussian noise (20%, 𝑁(0,1)) + Gaussian noise (70%, 𝑁(0,0.01))
20
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
choice at inference time making it inappropriate for blind Segmentation Dataset comprises of sixty eight natural
level denoising. Blind Universal Image Fusion De- images is commonly used. Kodak-24, LIVE and McMaster
noiser[95] is the network that extracts features to learn an are also being used for synthetic de-noising. RENOIR,
image prior and intermediate noise level values, which is NAM, DND, SIDD and Xu are datasets for real world de-
fed into the fusion part of model for final de-noising. The noising [89]. Some of the benchmark datasets are given in
latest de-noisers such as ADNet[64], BRDNet[51] , SCNN [96].
[49], PReLU [66], [78], [42] are designed for blind TABLE XIII
MODELS USED IN MULTI-DOMAIN IMAGE DE-NOISING
denoising. The blind de-noiser for mixed Gaussian impulse
NOISE SUPPRESSION IN
noise is also being designed [83].The recent research trend MODEL SPECIALITY
VARIOUS MODELS
in the field of computer vision is progressing towards the 1. Single DnCNN
development of universal and blind de-noiser for real-world Intuitively removes
model for blind
de-noising. Gaussian
DnCNN latent clean image in
VII. DESCRIPTION OF DATASET AND SOFTWARE denoising
the hidden layers
TOOLS [38] utilizing the residual
2. Single image
super-resolution
Software: The tremendous success of machine learning learning strategy
3. JPEG image
particularly deep learning is because of the parallel deblocking
computing of GPU. TABLE XII describes popular Uses an integration
of CNN and EM-
machine-learning libraries used for various computer vision based mixed noise
tasks. 1. Removal of
removal to give a
TABLE XII Gaussian mixture
variational method
POPULAR SOFTWARE PACKAGES EM-CNN noise
that can estimate the
[86] 2. Removal of
Software noise parameters
Developer Description Gaussian-Impulse
Packages iteratively to
noise
C++, Python and Matlab categorize noise
Berkeley AI interfaces types and levels in
Caffe
Research Widely used for object each pixel
detection tasks. Uses maximum
Matlab interface, C++ likelihood estimation
Oxford Visual compiler framework and
Matconvnet
Geometry Group Pre-trained models for sparse 1. Gaussian-Gaussian
computer vision tasks. representations over mixture
Facebook AI Python interface, Open a trained dictionary 2. Impulse noise
Pytorch
Research Lab source software W-KSVD and uses a self- 3. Gaussian-impulse
Python library for fast [84] determined noise
Montreal Institute numerical computation weighting data 4. Modified K-SVD
Theano of Learning Used for image de- fidelity function that for weighted rank-
Algorithms noising, classification, detects noise in one approximation
super-resolution terms of different
Faster than Theano estimated noise
Google Brain compiler parameters
Tensorflow
Team C++ and Python Uses variable
interfaces splitting technique to
Python interface bring strong image 1. Gaussian noise
Keras ONEIROS project Acts as interface for prior into model- (grey and color)
Tensorflow library based optimization 2. Mixed Gaussian
IRCNN [62]
Scalable, Flexible, methods and learned noise
Apache Software
MXNet Multiple programming CNN de-noisers are 3. Various low-level
Foundation
languages used as modules in vision applications
model-based
optimization
Datasets: The machine leaning based methods have shown Constructs paired
significant progress due to availability of open access training data from
benchmark datasets. The datasets are available for gray the given noisy
scale de-noising, color image de-noising, medical image de- images, and then 1. Blind Gaussian
GCBD [42] trains a deep noise
noising and real-world de-noising. The training dataset is denoising network 2. Blind Mixed noise
used for training the model, whereas testing dataset images for removing the
are used to assess the de-noising results. The peak signal to noise; GAN is used
noise ratio (PSNR) and structural similarity index (SSIM) to build the dataset
Uses enhanced
are most commonly used image quality assessment metrics. sparse representation
However, there are many image quality assessment metrics in transform domain 1. Gaussian noise
which are given in [39]. The performance comparison of BM3D [20]
where enhancement 2. Color image de-
de-noisers can be done if they use common testing dataset. of sparsity is noising
achieved by 3. Mixed noise
In case of Gaussian de-noisers, Set-12 dataset comprises of grouping similar 2-D
twelve scenes and BSD-68 dataset, i.e. Berkeley image fragments into
21
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
3-D data arrays PSNR values attain saturation. It implies that further
VIII. RESULT AND DISCUSSION increment in network length does not improve the de-
Out of all machine learning methods, dictionary learning noising performance. Apart from ADNet, BRDNet is the
models performance is inferior in terms of PSNR. The other network that integrates residual learning with batch
disadvantages of dictionary learning are heuristic selection renormalization and dilated convolutions to enhance de-
of the hyperparameters like sparsity level, number of atoms noising performance. The de-noising performance of
and iterations [97]. It fails to learn invariant features such as BRDNet can be attributed to an increase in receptive field
translational, rotation and scale invariance and it is apt for size by dilated convolutions and an increase in network
low dimensional signal only. The machine learning models width by concatenation of two networks. Therefore it
have evolved from fully connected neural networks to CNN overcomes the disadvantages of the previous networks,
based de-noisers. CNN’s have various advantages over such as (a) training difficulty and stagnation of results due
fully connected neural networks such as multi-layer to an increase in network length (b) mini-batch and internal
perceptron. The spatial information is intact in the case of co-variate shift problems. Further, PReLU [66] based edge
CNN whose input is multi-dimensional image data. The aware filter has attained best PSNR results both on Set-12
parameters of CNN are reduced due to weight sharing as a and BSD-68 dataset at different sigma levels. It has used
fixed weight kernel is used. Therefore, reduction in number parameteric rectified linear units as activation, which
of learning parameters, translational invariance and locality overcomes the disadvantage of ReLU by learning in the
due to convolutional operation has given edge to CNN over negative direction. The success can be attributed to the fact
other fully-connected models [98].Most of the CNN de- that this is a hybrid methodology which has the inclusion of
noisers require application oriented large datasets for principal component analysis and edge aware bilateral
supervised learning. The availability of medical image filter. Moreover, CNN uses supervised learning which is
datasets is still challenging as it requires manual becoming computationally demanding with an increase in
intervention for its annotation. Moreover, the de-noising dataset size. Therefore, the generative learning model of
results are almost stagnant after the network attains certain Generative Adversarial Network is being used. The GCBD
depth and there is no significant change by increasing the model gives promising result even in the absence of
number of training images. The CNN methodologies supervised learning data. Its PSNR value is same as that of
involve change in activation function, network depth, loss the DnCNN network on the BSD-68 dataset for noise
function, training dataset, etc. In order to solve this problem level=15, and 25. TABLE III gives a comparative analysis
there is a gradual shift from discriminative learning CNN of machine learning methods on the BSD-68 and Kodak-24
model to generative learning model GAN model. It uses datasets. It has been observed that there is no significant
two neural networks generator and the discriminator, where difference in the PSNR values of different networks. The
generator model creates plausible images and discriminator DIDN network designed with receptive field variation and
model constantly evaluates the generator images as real or modification of U-net architecture designed for semantic
fake. Therefore, both networks work in synchronization and segmentation perform better for color images too as shown
act as an adversarial for each other. The fundamental design in TABLE IV. DRUNet [63] network which is based on
of GAN is based on indirect training of the generator by the deep learning CNN based image prior plugged into the half
discriminator. This falls under the category of semi- quadratic splitting-based de-noising iterative algorithm
supervised learning. The training efficiency of GAN is shows good results on both gray and color images.
more than that of CNN as more features are learned in The impulse de-noisers predict pixels affected by noise in
GAN in the same number of epochs as compared to CNN the first step. It is followed by a noise contamination
[99]. The GAN’s achieve better results with less training determiner and post noise detection processing. The
images as compared to CNN. dictionary learning and CNN-based models are designed for
TABLE II shows de-noising results in terms of PSNR for impulse noise removal. However, the noise ratios are varied
dictionary learning and CNN based networks. The models in a very large range. To overcome the problem of less
progressed from K-SVD, KLLD i.e., from dictionary flexibility due to the unknown severity of contamination,
learning models to CNN based models. The DnCNN model [70] uses a noise ratio predictor that can measure the
is the benchmark residual learning-based Gaussian de- severity of corruption, i.e., the noise ratio of the image
noiser, which has led to the further development of many rapidly and efficiently. Fig. 11 (d) shows that blind CNN
de-noisers. The methodologies involve a change in loss achieves a higher value of PSNR, compared to ANN [72].
function, increase in receptive field size, the change in Blind CNN removes the noise and retains image details, the
number of layers, integration of transform, spatial domain reason being the NRP. It converts the noise mask into a
methods with CNN and inclusion of graph theory in CNN. noise ratio, and, according to this ratio, the most
It can be inferred from the TABLE II that PSNR values appropriate CNN model is selected for de-noising, rather
obtained by different CNN based methods are very close to than restoring image by removing the detected RVIN noise
each other. However, ADNet [64] network with four pixel-by-pixel.
modules suppresses the effect of network length on shallow The Poisson noise is modeled by its peak value and it is
layers and gives good PSNR results on Set-12 dataset. In also categorized into dictionary learning models and CNN
the CNN network, after an optimum number of layers,
22
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
based models. There is just one dictionary learning model learning approach, graph theory inclusion in neural
which perform de-noising by greedy pursuit algorithm and network, prior design, and receptive field enhancement are
boot strapping based stopping criterion. Gaussian de- some of the areas for future research.
noisers such as DnCNN and IRCNN are also being used for References
Poisson de-noising with different parameter settings. [1] W. Meiniel, J. C. Olivo-Marin, and E. D. Angelini, “Denoising
of microscopy images: A review of the state-of-the-art, and a
Although Gaussian de-noisers are being used for the
new sparsity-based method,” IEEE Trans. Image Process., vol.
poisson noise, the heuristic setting of network parameters is 27, no. 8, pp. 3842–3856, Aug. 2018, doi:
again a big challenge. TABLE V gives poisson de-noising 10.1109/TIP.2018.2819821.
performance on Live1 dataset. DenoiseNet is the first [2] B. Goyal, A. Dogra, S. Agrawal, B. S. Sohi, and A. Sharma,
“Image denoising review: From classical to state-of-the-art
residual learning based CNN de-noiser designed for poisson
approaches,” Inf. Fusion, vol. 55, pp. 220–244, Mar. 2020, doi:
noise. Later, CNN+LSTM and MC2RNet models have 10.1016/j.inffus.2019.09.003.
outperformed DenoiseNet. The CNN + LSTM Poisson de- [3] Y. E. Gökdağ, F. Şansal, and Y. D. Gökdel, “Image denoising
noiser, which uses CNN for feature extraction and LSTM using 2-D wavelet algorithm for Gaussian-corrupted confocal
microscopy images,” Biomed. Signal Process. Control, vol. 54,
layers to store noise components, outperforms DenoiseNet
p. 101594, Sep. 2019, doi: 10.1016/j.bspc.2019.101594.
as given in TABLE V and VI. The inclusion of Blahut- [4] R. Gonzalez and R. Wood, Digital Image Processing, 3rd ed.,
Arimoto algorithm to determine number of CNN layers and London, U.K., Pearson Education, 2008.
learning of residual noise statistics by LSTM improves the [5] S. W. Hasinoff, “Photon, Poisson Noise,” in Computer Vision,
Springer US, pp. 608–610, 2014, doi:
de-noising results of CNN+LSTM. Deep multi-scale cross-
https://doi.org/10.1007/978-0-387-31439-6_482
path concatenation residual network (MC2RNet) which [6] I. Rodrigues, J. Sanches, and J. Bioucas-Dias, “Denoising of
incorporates cross-path concatenation modules for de- medical images corrupted by poisson noise,” in Proceedings -
noising also outperforms CNN based DenoiseNet as given International Conference on Image Processing, ICIP, 2008, pp.
1756–1759, doi: 10.1109/ICIP.2008.4712115.
in TABLE VII and TABLE VIII on Set 10 and BSD-68
[7] H. Yu, M. Ding, X. Zhang, and J. Wu, “PCANet based nonlocal
dataset respectively. Therefore, CNN+LSTM, DenoiseNet means method for speckle noise removal in ultrasound images,”
and MC2RNet are the available CNN based Poisson de- PLoS One,vol.13, no. 10, pp. e0205390, 2018, doi:
noisers which are less in number as compared to Gaussian 10.1371/journal.pone.0205390.
[8] E. E. Kuruoǧlu and J. Zerubia, “Modeling SAR images with a
de-noisers.
generalization of the Rayleigh distribution,” IEEE Trans. Image
The mixed noise can be modeled mathematically in Process., vol. 13, no. 4, pp. 527–533, Apr. 2004, doi:
different ways. There are models designed for mixture of 10.1109/TIP.2003.818017.
impulse and Gaussian noise. The four-stage residual [9] G. Kim, J. Cho, and M. Kang, “Cauchy Noise Removal by
Weighted Nuclear Norm Minimization,” J. Sci. Comput., vol.
learning-based mixed network [82] and de-noiser with two-
83, no. 1, pp. 1–21, Apr. 2020, doi: 10.1007/s10915-020-01203-
stage cascade connection of impulse and Gaussian de- 2.
noiser are used for mixed Gaussian Impulse noise given in [10] R. Lukac, K. N. Plataniotis, B. Smolka, and A. N.
TABLE IX. TABLE X discusses mixed Gaussian noise. Venetsanopoulos, “A multichannel order-statistic technique for
cDNA microarray image processing,” IEEE Trans.
And, TABLE XI discusses blind de-noising. Figs. 10 to 14
Nanobioscience, vol. 3, no. 4, pp. 272–285, Dec. 2004, doi:
depict qualitative results of different images with different 10.1109/TNB.2004.837907.
noise types, i.e., Gaussian noise, impulse noise, mixed [11] Y. Norose, K. Mizutani, N. Wakatsuki, and T. Ebihara, “Noise
noise, real-world, and blind noise. TABLE XIII discusses reduction in ultrasonic computerized tomography by
preprocessing for projection data,” Jpn. J. Appl. Phys., vol. 54,
the methods that extend into multiple domains of image de-
no. 7, p. 07HC12, Jul. 2015, doi: 10.7567/JJAP.54.07HC12.
noising. [12] M. Ye and Y. Qian, “Mixed Poisson-Gaussian noise model
IX. CONCLUSION AND FUTURE SCOPE based sparse denoising for hyperspectral imagery,” in Workshop
In this paper, comprehensive study and analysis of machine on Hyperspectral Image and Signal Processing, Evolution in
learning models for removal of different noises is provided. Remote Sensing, 2012, pp. 1–4, doi:
10.1109/WHISPERS.2012.6874280.
The categorization of different de-noisers is done into [13] L. Fan, F. Zhang, H. Fan, and C. Zhang, “Brief review of image
dictionary learning models such as CNN based models and denoising techniques,” Vis. Comput. Ind. Biomed. Art, vol. 2, no.
GAN based models. The comparative analysis PSNR 7, 2019, doi: 10.1186/s42492-019-0016-7.
results of different de-noisers on some benchmark datasets [14] M. Mafi, H. Martin, M. Cabrerizo, J. Andrian, A. Barreto, and
M. Adjouadi, “A comprehensive survey on impulse and
are provided for better understanding of reader. It has been Gaussian denoising filters for digital images,” Signal
observed that integration of analytical methods in machine Processing, vol. 157. Elsevier B.V., pp. 236–260, 01-Apr-2019,
learning model can further improve the results. Although doi: 10.1016/j.sigpro.2018.12.006.
there are numerous networks designed for synthetic [15] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and
color images,” in Proceedings of the IEEE International
datasets, but real-world image de-noising is still a Conference on Computer Vision, 1998, pp. 839–846, doi:
challenging problem. The GAN based de-noisers are still in 10.1109/iccv.1998.710815.
primitive stage. However, the generative learning based [16] R. G. Gavaskar and K. N. Chaudhury, “Fast Adaptive Bilateral
GAN and deep belief networks can perform unsupervised Filtering,” IEEE Trans. Image Process., vol. 28, no. 2, pp. 779–
790, Feb. 2019, doi: 10.1109/TIP.2018.2871597.
learning to certain extent unlike CNN. The future prospects [17] L. Zhang, W. Dong, D. Zhang, and G. Shi, “Two-stage image
lie in design of real-world de-noisers with unsupervised denoising by principal component analysis with local pixel
learning framework for practical applications. The transfer grouping,” Pattern Recognit., vol. 43, pp. 1531–1549, 2010, doi:
10.1016/j.patcog.2009.09.023.
23
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
[53] P. Liu, H. Zhang, W. Lian, and W. Zuo, “Multi-Level Wavelet Dec. 2018, doi: 10.3390/a12010007.
Convolutional Neural Networks,” IEEE Access, vol. 7, pp. [70] B. Deka and P. K. Bora, “Removal of random-valued impulse
74973–74985, 2019, doi: 10.1109/ACCESS.2019.2921451. noise using sparse representation,” in 2011 National Conference
[54] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional on Communications, NCC 2011, 2011, doi:
networks for biomedical image segmentation,” in Lecture Notes 10.1109/NCC.2011.5734762.
in Computer Science (including subseries Lecture Notes in [71] J. Chen, G. Zhang, S. Xu, and H. Yu, “A blind CNN denoising
Artificial Intelligence and Lecture Notes in Bioinformatics), model for random-valued impulse noise,” IEEE Access, vol. 7,
2015, vol. 9351, pp. 234–241, doi: 10.1007/978-3-319-24574- pp. 124647–124661, 2019, doi:
4_28. 10.1109/ACCESS.2019.2938799.
[55] S. Yu, B. Park, and J. Jeong, “Deep iterative down-up CNN for [72] I. Turkmen, “The ANN based detector to remove random-valued
image denoising,” in IEEE Computer Society Conference on impulse noise in images,” J. Vis. Commun. Image Represent.,
Computer Vision and Pattern Recognition Workshops, 2019, vol. vol. 34, no. October, pp. 28–36, 2016, doi:
2019-June, pp. 2095–2103, doi: 10.1109/CVPRW.2019.00262. 10.1016/j.jvcir.2015.10.011.
[56] K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward a fast and [73] G. Li, X. Xu, M. Zhang, and Q. Liu, “Densely connected
flexible solution for CNN-Based image denoising,” IEEE Trans. network for impulse noise removal,” Pattern Anal. Appl., vol.
Image Process., vol. 27, no. 9, pp. 4608–4622, 2018, doi: 23, no. 3, pp. 1263–1275, Aug. 2020, doi: 10.1007/s10044-020-
10.1109/TIP.2018.2839891. 00871-y.
[57] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic [74] H. Y. Khaw, F. C. Soon, J. H. Chuah, and C. O. Chow, “High-
optimization,” in 3rd International Conference on Learning density impulse noise detection and removal using deep
Representations, ICLR 2015 - Conference Track Proceedings, convolutional neural network with particle swarm optimisation,”
2015. IET Image Process., vol. 13, no. 2, pp. 365–374, Feb. 2019, doi:
[58] C. Cruz, A. Foi, V. Katkovnik, and K. Egiazarian, “Nonlocality- 10.1049/iet-ipr.2018.5776.
Reinforced Convolutional Neural Networks for Image [75] L. Jin, W. Zhang, G. Ma, and E. Song, “Learning deep CNNs for
Denoising,” IEEE Signal Process. Lett., vol. 25, no. 8, pp. 1216– impulse noise removal in images,” J. Vis. Commun. Image
1220, 2018, doi: 10.1109/LSP.2018.2850222. Represent., vol. 62, pp. 193–205, 2019, doi:
[59] D. Valsesia, G. Fracastoro, and E. Magli, “Image Denoising with 10.1016/j.jvcir.2019.05.005.
Graph-Convolutional Neural Networks,” in Proceedings - [76] R. Giryes and M. Elad, “Sparsity-based poisson denoising with
International Conference on Image Processing, ICIP, 2019, vol. dictionary learning,” IEEE Trans. Image Process., vol. 23, no.
2019-September, pp. 2399–2403, doi: 12, pp. 5057–5069, Dec. 2014, doi: 10.1109/TIP.2014.2362057.
10.1109/ICIP.2019.8803367. [77] W. Kumwilaisak, T. Piriyatharawet, P. Lasang, and N.
[60] S. Lefkimmiatis, “Universal Denoising Networks : A Novel Thatphithakkul, “Image Denoising with Deep Convolutional
CNN Architecture for Image Denoising,” in Proceedings of the Neural and Multi-Directional Long Short-Term Memory
IEEE Computer Society Conference on Computer Vision and Networks under Poisson Noise Environments,” IEEE Access,
Pattern Recognition, 2018, pp. 3204–3213, doi: vol. 8, pp. 86998–87010, 2020, doi:
10.1109/CVPR.2018.00338. 10.1109/ACCESS.2020.2991988.
[61] W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, [78] Y. Su, Q. Lian, X. Zhang, B. Shi, and X. Fan, “Multi-scale
“Denoising Prior Driven Deep Neural Network for Image Cross-path Concatenation Residual Network for Poisson
Restoration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, denoising,” IET Image Process., vol. 13, no. 8, pp. 1295–1303,
no. 10, pp. 2305–2318, 2019, doi: Jun. 2019, doi: 10.1049/iet-ipr.2018.5941.
10.1109/TPAMI.2018.2873610. [79] R. Tal, O. Litany, R. Giryes, and A. M. Bronstein,
[62] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN “[1701.01687] Deep Convolutional Denoising of Low-Light
denoiser prior for image restoration,” in Proceedings - 30th Images,” https://arxiv.org/abs/1701.01687, 2017. [Online].
IEEE Conference on Computer Vision and Pattern Recognition, Available: https://arxiv.org/abs/1701.01687.
CVPR 2017, 2017, vol. 2017-January, pp. 2808–2817, doi: [80] T. Remez, O. Litany, R. Giryes, and A. M. Bronstein, “Class-
10.1109/CVPR.2017.300. Aware Fully Convolutional Gaussian and Poisson Denoising,”
[63] K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. IEEE Trans. Image Process., vol. 27, no. 11, pp. 5707–5722,
Timofte, “Plug-and-Play Image Restoration with Deep Denoiser Nov. 2018, doi: 10.1109/TIP.2018.2859044.
Prior,” arXiv, 31-Aug-2020. [81] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Color
[Online].Available:http://arxiv.org/abs/2008.13751. image denoising via sparse 3D collaborative filtering with
[64] C. Tian, Y. Xu, Z. Li, W. Zuo, L. Fei, and H. Liu, “Attention- grouping constraint in luminance-chrominance space,” in
guided CNN for image denoising,” Neural Networks, vol. 124, Proceedings - International Conference on Image Processing,
pp. 117–129, Apr. 2020, doi: 10.1016/j.neunet.2019.12.024. ICIP, 2006, vol. 1, doi: 10.1109/ICIP.2007.4378954.
[65] R. Couturier, G. Perrot, and M. Salomon, “Image denoising [82] M. T. Islam, S. M. Mahbubur Rahman, M. Omair Ahmad, and
using a deep encoder-decoder network with skip connections,” M. N. S. Swamy, “Mixed Gaussian-impulse noise reduction
in Lecture Notes in Computer Science (including subseries from images using convolutional neural network,” Signal
Lecture Notes in Artificial Intelligence and Lecture Notes in Process. Image Commun., vol. 68, no. June, pp. 26–41, 2018,
Bioinformatics), 2018, vol. 11306 LNCS, pp. 554–565, doi: doi: 10.1016/j.image.2018.06.016.
10.1007/978-3-030-04224-0_48. [83] R. Abiko and M. Ikehara, “Blind Denoising of Mixed Gaussian-
[66] R. S. Thakur, R. N. Yadav, and L. Gupta, “PReLU and edge- impulse Noise by Single CNN,” in ICASSP, IEEE International
aware filter-based image denoiser using convolutional neural Conference on Acoustics, Speech and Signal Processing -
network,” IET Image Process., vol. 14, no. 15, pp. 3869–3879, Proceedings, 2019, vol. 2019-May, pp. 1717–1721, doi:
Dec. 2020, doi: 10.1049/iet-ipr.2020.0717. 10.1109/ICASSP.2019.8683878.
[67] Y. Zhong, L. Liu, D. Zhao, and H. Li, “A generative adversarial [84] J. Liu, X. C. Tai, H. Huang, and Z. Huan, “A weighted
network for image denoising,” Multimed. Tools Appl., vol. 79, dictionary learning model for denoising images corrupted by
no. 23–24, pp. 16517–16529, Jun. 2020, doi: 10.1007/s11042- mixed noise,” IEEE Trans. Image Process., vol. 22, no. 3, pp.
019-7556-x. 1108–1120, 2013, doi: 10.1109/TIP.2012.2227766.
[68] S. Wang et al., “Dictionary learning based impulse noise [85] M. Elad and M. Aharon, “Image denoising via sparse and
removal via L1-L1 minimization,” Signal Processing, vol. 93, redundant representations over learned dictionaries,” IEEE
no. 9, pp. 2696–2708, 2013, doi: 10.1016/j.sigpro.2013.03.005. Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, Dec.
[69] D. Guo, Z. Tu, J. Wang, M. Xiao, X. Du, and X. Qu, “Salt and 2006, doi: 10.1109/TIP.2006.881969.
Pepper Noise Removal with Multi-Class Dictionary Learning [86] F. Wang, H. Huang, and J. Liu, “Variational-Based Mixed Noise
and L0 Norm Regularizations,” Algorithms, vol. 12, no. 1, p. 7, Removal with CNN Deep Learning Regularization,” IEEE
25
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3092425, IEEE Access
26
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/