1 s2.0 S2772528621000340 Main
1 s2.0 S2772528621000340 Main
1 s2.0 S2772528621000340 Main
Neuroscience Informatics
journal homepage: www.elsevier.com/locate/neuri
A r t i c l e i n f o A B strACt
Article history:
Skin cancer is one of the most prevalent and deadly types of cancer. Dermatologists diagnose this
Received 24 October 2021
disease primarily visually. Multiclass skin cancer classification is challenging due to the fine-grained
Received in revised form 1 December 2021
Accepted 6 December 2021
variability in the appearance of its various diagnostic categories. On the other hand, recent studies have
demonstrated that convolutional neural networks outperform dermatologists in multiclass skin cancer
Keywords: classification. We developed a preprocessing image pipeline for this work. We removed hairs from the
Convolutional neural networks images, augmented the dataset, and resized the imageries to meet the requirements of each model. By
CNN performing transfer learning on pre-trained ImageNet weights and fine-tuning the Convolutional Neural
Deep learning Networks, we trained the EfficientNets B0-B7 on the HAM10000 dataset. We evaluated the performance
EfficientNet of all EfficientNet variants on this imbalanced multiclass classification task using metrics such as Precision,
HAM10000 dataset
Recall, Accuracy, F1 Score, and Confusion Matrices to determine the effect of transfer learning with fine-
Medical imaging
tuning. This article presents the classification scores for each class as Confusion Matrices for all eight
Multiclass skin cancer classification
Skin cancer classification
models. Our best model, the EfficientNet B4, achieved an F1 Score of 87 percent and a Top-1 Accuracy
Transfer learning of 87.91 percent. We evaluated EfficientNet classifiers using metrics that take the high-class imbalance
into account. Our findings indicate that increased model complexity does not always imply improved
classification performance. The best performance arose with intermediate complexity models, such as
EfficientNet B4 and B5. The high classification scores resulted from many factors such as resolution
scaling, data enhancement, noise removal, successful transfer learning of ImageNet weights, and fine-
tuning [70–72]. Another discovery was that certain classes of skin cancer worked better at generalization
than others using Confusion Matrices.
2021 The Author(s). Published by Elsevier Masson SAS. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
spleen, or brain [2,62]. Mainly, metastatic melanoma ranks third
most typical origin of central nervous system (CNS) metastases. In
According to the World Health Organization report, skin can- particular, the advanced stage of melanoma can often cause brain
cer is diagnosed in one out of three people worldwide. Further- metastasis, whose treatment requires radiation therapy and im-
more, one in every five Americans will develop skin cancer during munotherapy. Furthermore, melanoma accounts for almost 10 per-
their lifetime, according to the Skin Cancer Foundation [1,75,76]. cent of brain metastasis [3–5]. Melanoma is responsible for 10,000
Melanoma and non-melanoma skin cancers are the most com- deaths each year in the United States alone [6]. These figures ap-
mon types. Worldwide, approximately two to three million non- pear bleak, but detecting cancer at an early stage reduces the risks
melanoma skin cancers and 132000 melanoma skin cancers are of death significantly. Melanoma can be cured in nearly 95 per-
diagnosed each year [1,2]. cent of cases if detected early [7]. Thus, it is critical for early-stage
Melanoma is the deadliest type of skin cancer. A melanoma diagnosis of skin cancer to prolong patient survival.
cell tends to travel to other body parts, including the lungs, liver, The dermatologist’s experience limits the visual evaluation of
dermatoscopic images (or manual dermatoscopy). Due to the sub-
jectivity of human decision-making, besides considerable inter-
* Corresponding author. class similarity in skin lesions and other confounding factors, this
E-mail address: asifalilaghari@gmail.com (A.A. Laghari). method is prone to mistakes. General diagnostic procedures for
1
These authors contributed equally to this work. identifying skin cancer, such as the ABCD (Asymmetry, Border,
2
https://g.co/kgs/LMViMh. Color, Diameter) rule [8] or the 7-point checklist [9], can only
https://doi.org/10.1016/j.neuri.2021.100034
2772-5286/ 2021 The Author(s). Published by Elsevier Masson SAS. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
2
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
Diagnostic category Training Validation Testing CNNs can be scaled to achieve better Accuracy. However, the
akiec 48 30 49 scaling process was never thoroughly investigated. It entailed an
bcc 370 48 96 iterative manual tuning process, either by arbitrarily increasing the
bkl 775 90 234
depth or width of the CNN or by using a higher input image res-
df 82 8 25
mel 883 85 145 olution. The EfficientNet family of architectures was developed by
nv 4745 550 1410 [24] with an intent to find a suitable method to scale CNNs to
vasc 108 12 22 achieve better Accuracy (i.e., model performance) and efficiency
Total 7211 823 1981
(i.e., model parameters and FLOPS). The authors propose a com-
pound scaling method that uses a fixed set of coefficients to uni-
formly scale width, depth, and resolution. That method allowed
authors to produce efficient CNN architecture, which they named
EfficientNet B0.
Further, they obtained EfficientNets B1-B7 by scaling the base-
line network (i.e., EfficientNet B0) using the same compound
model scaling method. Thus, [24] displays eight different scales’
CNN architectures and their performances based on the ImageNet
dataset [24]. While EfficientNet B0 CNN architecture has 5.3 mil-
lion parameters and takes a 224×224 image as input, EfficientNet
B7 has 66 million parameters and takes a 600×600 image as input.
Scaling the depth of the network allows CNNs to capture richer
and more complex features. However, network training becomes
Fig. 1. Preprocessing pigmented skin lesion images: (a) original image for each class more challenging due to the vanishing gradient problem [30]. Scal-
(b) preprocessed image for each class. ing the width of the network allows the network to capture more
fine-grained features. It is also easy to train. Wide and shallow
3.1. Image preprocessing pipeline networks, on the other hand, are incapable of capturing high-level
features. Finally, higher resolution images allow CNNs to capture
In the HAM10000 dataset, each image has a dimension of finer-grained patterns. Larger images require more computational
600×450. The images were resized (resolution scaling) based on power and memory. In our experiments, we tested the perfor-
the EfficientNet [24,39] variant used for training. mance of the eight EfficientNet models (EfficientNets B0-B7) on
As images in the HAM10000 dataset consist of pigmented skin the HAM10000 dataset [23].
lesions, our goal, i.e., the classification of skin cancer classes, the
presence of hairs is not relevant. The fur in the image contributes 3.3. Transfer learning
to the noise. CNN will have to learn that the arbitrary strands
spread across the skin lesion image are irrelevant to our task. Also, Transfer learning, also known as domain adaptation, is a high-
there is a danger for the CNN model to discover correlations be- level concept that utilizes the knowledge acquired in a domain or
tween noise and the target (class of skin cancer) (class of skin task to solve related tasks. We leveraged this previously learned
cancer). If we do not remove this noise from the image, CNN will knowledge from the models trained on the ImageNet dataset and
have to learn about ignoring the noise by gradient descent across used their parameters for our task. However, our approach evalu-
a large dataset of images. Due to limitations in the size of the ates EfficientNet models on a medical image dataset of pigmented
dataset (only 10015 images) and computation steps, image pre- skin lesions. Due to the difference in the domains of the dataset
processing removed most of the noise while preserving the signal images, we cannot directly use the pre-trained weights for infer-
in the image (Fig. 1) using image inpainting [25,72–74]. The al- ence and expect high performance. Thus, we performed a fine-
gorithm relies on the Fast Marching Method [26,27]. We need to tuning process. In this step, the [trained] model’s parameters are
create a mask corresponding to the area, which must be inpainted tweaked precisely to adapt to the new domain of the images.
(in our case, the hair strands in each image). The blackhat trans- There are many ways to do fine-tuning. These include fine-
form [28] worked as the mask. Through these two algorithms, we tuning all or some parameters of the last few layers of a pre-
got a cleaner dataset of skin lesion images. The samples of the al- trained model [31,32] or utilizing a pre-trained model as a fixed
gorithm output for each class are in Fig. 1. features’ extractor from which features better feed each classifier,
We increased the dataset size through image augmentation. i.e., support vector machine for classification [33]. We employed
The size of the dataset has usually been an issue in the medi- both transfer learning and fine-tuning in EfficientNets B0-B7.
cal domain as neural nets require a colossal amount of labeled
data for training. Labeling the medical images is expensive and
3.4. Modifications in network architecture
requires a qualified medical professional for the task. It is unlike
other domains where non-experts can perform the labeling of the
The top three layers of EfficientNet models (EfficientNets B0-
data. Previously, the importance of data augmentation for skin le-
B7) were suitable for the ImageNet dataset. Therefore, new layers
sion analysis has been established [29]. We artificially augmented
for our use case (seven-class skin cancer prediction) replaced the
the dataset size through rotation, zooming, horizontal, and verti-
top three layers. In particular, EfficientNets B0-B6 were overfitting
cal flipping. This section explains the preprocessing image pipeline
with the top three-layer structure. For this reason, we realized the
that we built to remove hairs from images, augment the dataset,
necessity to add more dense batch normalization, dropout layers
and resize images according to the requirements of each model of
at the top of each model after removing the top three layers from
the EfficientNets B0-B7. The EfficientNet model architecture, mod-
each model. Thus, the top three layers, i.e., Global Average Pool-
ification in model architectures, and the transfer-learning process,
ing 2D, dropout, and dense layers of each model, were entirely
which trains the HAM10000 dataset on pre-trained weights of Im-
replaced with layers defined in Table 3.
ageNet and fine-tunes CNNs, are also explained.
3
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
Table 3
Modified layer structure for EfficientNet B0-B6.
4
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
For EfficientNets B6-B7, the training was more unstable in the 5.3. Precision
above conditions. Thus, fine-tuning happened in two steps. In the
first step, we performed fine-tuning only for newly added lay- Precision amounts to the fraction of images correctly labeled as
ers while keeping convolutional blocks frozen. It means that the
belonging to the positive class divided by the total number of im-
convolutional base received no gradient updates. Now that the re-
ages labeled as belonging to the positive class by the model [43,44]
cently added layers received some weights, so for the second step,
as follows:
we defreezed the last four convolutional blocks of the base model
while keeping all other blocks frozen and performing fine-tuning Σl
again. The last four convolutional blocks took into account com- Precision = Σ i=1 tpi
.
puting limitations. l
i=1 (tpi + fpi)
Also, the model complexity of the official EfficientNets B6-B7
(i.e., the model’s ability to overfit on our dataset) is much larger as
5.4. Recall
EfficientNets B6, and B7 contain more parameters than Efficient-
Nets B0-B5. So, it was reasonable not to fine-tune all layers. Also,
The proportion of actual positives correctly identified by the
rather than using SGD, we used Adam optimizer as [37,65] did. We
model comes from the Recall metric. Alternatively, it is the number
found that in terms of stability and performance, Adam optimizer
of true positives divided by the number of images in the positive
[37,68] yielded better results than SGD while training big models
like EfficientNet (B6, B7). Lastly, the polynomial decay learning rate class [45–47]:
scheduler [38,69,70] allowed for more stable convergence. We have Σl
summarized the model-specific modifications in image size, batch Recall = Σ i=1 tpi
l .
size, and other hyperparameters in Table 4. (
i=1 tpi + fni)
5
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
Table 4
Model-specific modifications.
Model variant Image size Batch size* Learning rate Optimizer Learning rate decay
EfficientNet B0 224×224 32 0.001 SGD SGD decay rate
EfficientNet B1 240×240 32 0.001 SGD SGD decay rate
EfficientNet B2 260×260 32 0.001 SGD SGD decay rate
EfficientNet B3 300×300 16 0.001 SGD SGD decay rate
EfficientNet B4 380×380 8 0.001 SGD SGD decay rate
EfficientNet B5 456×456 4 0.0006 SGD SGD decay rate
EfficientNet B6 528×528 16 0.0025 Adam Polynomial-decay
EfficientNet B7 600×600 16 0.0025 Adam Polynomial-decay
*
The batch size was varied for higher complexity models due to limitations in computational resources.
5.5. F1 score
Table 5
Accuracy test results of EfficientNet B0-B7.
By the definitions of Precision and Recall, there seems to be Models Top-1 accuracy Top-2 accuracy Top-3 accuracy
a trade-off between the two measures. When we improve Recall, EfficientNet B0 83.02% 93.80% 97.39%
we reduce Precision and vice versa [48–50]. Depending upon the EfficientNet B1 83.69% 93.90% 97.34%
EfficientNet B2 83.95% 93.75% 97.39%
application domain and the user requirement, we might require
EfficientNet B3 83.90% 94.63% 97.65%
maximizing one over the other. However, we use the F Beta Score EfficientNet B4 87.91% 95.67% 97.81%
in case we want the optimal blend of both metrics (i.e., to as- EfficientNet B5 87.62% 94.59% 97.55%
sign different weights to each metric). The F Beta Score amounts
to the weighted harmonic mean between Precision and Recall [51].
Table 6
It favors Recall over Precision by a factor of Beta. Precision and Re- Model Wise Precision, Recall, F1 Score, Specificity, and Roc_Auc Comparisons.
call are both equally important in this context. As a result, Beta=1
becomes the F1 Score. The harmonic mean of Precision and Recall Models Precision Recall F1 score Specificity Roc_Auc
gives the F1 Score. Higher F1 Score values indicate good predictive EfficientNet B0 84% 83% 82% 84% 95.94%
EfficientNet B1 85% 84% 83% 84% 96.10%
power [52–55]. In the case of a multiclass classification problem, EfficientNet B2 85% 84% 84% 86% 96.36%
we calculated the F1 Score over all classes to get a holistic view of EfficientNet B3 87% 84% 84% 91% 96.67%
the model’s performance as per the expression below. Note that F1 EfficientNet B4 88% 88% 87% 88% 97.53%
EfficientNet B5 88% 88% 87% 88% 97.54%
Score results are not in between Precision and Recall results.
EfficientNet B6 86% 85% 85% 89% 96.76%
(β2 + 1)Precision ∗ Recall EfficientNet B7 86% 86% 85% 87% 97.23%
EfficientNet B6 85.36% 94.01% 96.97%
EfficientNet B7 85.52% 94.84% 98.12%
F 1 Score = .
β2Precision + Recall
5.6. Specificity
are simpler models, EfficientNets B0-B3. As a result, the ranking of
Specificity is a metric to determine the model’s percentage Top-1 Accuracy is B4∼B5 > B6∼B7 > B1∼B2∼B3 > B0. There is
of actual negative cases correctly identified as negative [56–58]. a difference of roughly 5 percent in Top-1 Accuracy performance
Specificity is the ratio of TN and the sum of TN and FP. Higher between the best model, EfficientNet B4, and the worst model, Ef-
Specificity implies a higher TN value while a lower FP value [59– ficientNet B0. Across all models, the Top-2 Accuracy is practically
61], according to the same, with a maximum absolute deviation of 1.25 percent. At
the same time, the maximum absolute deviation of 0.9 percent
Σ
l emerges for Top-3 Accuracy.
Specificity = Σ i=1 tni
. As proven in the Dataset Section, the class distribution for the
l
i=1 (tni + fpi) HAM10000 dataset is very unsymmetrical, i.e., high-class imbal-
ance. Table 6 illustrates the Precision, Recall, F1 Score, Specificity, and
5.7. Roc_Auc score Roc Auc scores for each EfficientNet variation on the HAM10000
dataset. Once again, the same pattern occurs for all these perfor-
Roc_Auc or AUC is also known as AUROC, the area under receiver EfficientNets B4 and B5 beat EfficientNets B6 and B7. Finally, there
operating characteristics. The Roc_Auc Score represents the degree
of separability. It indicates how well the model can distinguish be-
tween classes. This metric usually appraises a binary classification
task [62–64]. For a multiclass classification problem, one versus all
methodology is used to obtain Roc_Auc Scores. The range of the
Roc_Auc Score is from 0 to 1. A model having a Roc_Auc Score close
to 1 is considered the best model. We employed one versus all
methodology to obtain a weighted Roc_Auc Score for our models.
7
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
8
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
Table 7
Comparative study of the HAM10000 dataset.
9
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
10
K. Ali, Z.A. Shaikh, A.A. Khan et al. Neuroscience Informatics 2 (2022) 100034
[75] Francesco Piccialli, Vittorio Di Somma, Fabio Giampaolo, Salvatore Cuomo, Gi-
ancarlo Fortino, A survey on deep learning in medicine: why, how and when?, [77] J. Hemanth, V.V. Estrela, Deep Learning for Image Processing Applications, Ad-
Inf. Fusion 66 (2021) 111–137. vances in Parallel Computing, vol. 31, IOS Press, Amsterdam, Netherlands, 2017.
[76] N. Razmjooy, M. Ashourian, M. Karimifard, V.V. Estrela, H.J. Loschi, D. do Nasci- [78] A. Deshpande, V.V. Estrela, N. Razmjooy, Computational Intelligence Methods
mento, R.P. Franc¸ a, M. Vishnevski, Computer-Aided Diagnosis of Skin Cancer: A for Super-Resolution in Image Processing Applications, Springer Nature, Zurich,
Review, Current Medical Imaging, Bentham Science Publishers, Sharjah, U.A.E, Switzerland, 2021.
2020.
11
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: