Automatic Malignant and Benign Skin Cancer Classification
Using a Hybrid Deep Learning Approach
Atheer Bassel, Amjed Basil Abdulkareem, Zaid Abdi Alkareem Alyasseri, Nor Samsiah Sani
and Husam Jasim Mohammed 6
Abstract: Skin cancer is one of the major types of cancer with an increasing incidence in recent
decades. The source of skin cancer arises in various dermatologic disorders. Skin cancer is classified
into various types based on texture, color, morphological features, and structure. The conventional
approach for skin cancer identification needs time and money for the predicted results. Currently,
medical science is utilizing various tools based on digital technology for the classification of skin
cancer. The machine learning-based classification approach is the robust and dominant approach
for automatic methods of classifying skin cancer. The various existing and proposed methods of
deep neural network, support vector machine (SVM), neural network (NN), random forest (RF), and
K-nearest neighbor are used for malignant and benign skin cancer identification.
method was proposed based on the stacking of classifiers with three folds towards the classification of
melanoma and benign skin cancers. The system was trained with 1000 skin images with the categories
data set, respectively. The primary feature extraction was conducted using the Resnet50, Xception,
diagnostics12102472
Academic Editor: Vadim V. Grubov
Received: 26 August 2022
VGG 16 methods. The improvement and optimization of the proposed method with a large training
dataset could provide a reliable and robust skin cancer classification system.
Keywords: skin cancer; deep learning; CNN; machine learning; prediction
1. Introduction
The goal of detecting and curing cancer in humans is a difficult one for medical science.
In the United States, skin cancer is the most frequent type of cancer. Melanoma is one
of the fastest-growing and most dangerous cancers. In its advanced stages, treating this
cancer is very difficult. The goal of early identification and treatment of this form of cancer
is to reduce the number of cancer patients in the United States. Malignant melanoma
to travel to the lower layers of our skin, enter the circulation, and then spread to other
regions of our bodies.
Computer-assisted technologies and methods are required for early skin cancer diag-
nosis and detection. The accuracy of clinical diagnosis for cancer detection is improved
by computer-aided techniques and equipment. The most significant non-invasive method
for detecting malignant, benign, and other pigmented skin cancers is dermoscopy [3].
The eye-based examination and recording of color changes in the skin are the traditional
methods of melanoma detection and main feature identification. This classic technique
for skin cancer detection relies on the surface structure and color of the skin. Dermoscopy
allows for improved classification of cancer types based on their appearance and mor-
phological characteristics [4]. Dermatologists rely on their experience while inspecting
dermoscopy photos. Computerized analysis of dermoscopy pictures has become an im-
portant study topic to decrease diagnostic mistakes due to the complexity and subjectivity
of human interpretation [5]. The accuracy of skin cancer diagnosis can be improved by
using dermoscopy pictures to identify cancer. Figure 1 shows a graphical illustration of the
distinctions between melanoma and benign skin cancer.
Figure 1.
(a) "benign cancer"
(a) “benign cancer” cancer” (b)
"Melanoma" [6].
Related Work
Many studies on the detection and diagnosis of malignant and benign skin cancer have
been conducted in the last decade. The numerous datasets are provided for the research
community. Researchers have applied strategies based on splitting, merging, clustering,
and classification to the identification and treatment of skin cancer. Each approach has its
own set of limitations and advancements from the medical community to assist medical
experts in making decisions.
Rajasekhar et al. (2020) suggested an automated melanoma detection and classifi-
cation approach based on border and wavelet-based texturing algorithms. For wavelet-
decomposition and boundary-series models, the suggested approach used texture, border,
and geometry information. SVM, random forest, logistic model tree, and hidden naive
Bayes algorithms were used to classify the data [16].
A malignant skin cancer recognition system based on a support vector machine was
proposed by Murugan et al. (2019). The asymmetry, border irregularity, color variation,
diameter, and texture features were used for the classification of the system. The texture of
the skin is the dominant feature used for decision making. The convolution neural network
using the VGG net is used for the problem solving of skin cancer identification. The system
is trained using the transfer learning approach [17].
Seeja et al. (2019) presented the heuristic hybrid rough set particle swarm optimization
(HRSPSO) method for segmenting and classifying a digital picture into multiple segments
that are more relevant and easier to study [18].
Goyal et al. (2019) offered three classification algorithms and proposed a multi-scale
integration strategy for segmentation [19]. Multiclass classification, binary classification,
and an ensemble model are all examples of classification methods. Taghanaki et al. (2020)
employed the discrete wavelet transform to extract features and analyze texture. These
collected characteristics were then used to train and assess the lesions as malignant and
benign using stack auto encoders (SAEs) [20].
The early diagnosis of cancer categorization based on interpretation, according to
Hasan, Md Kamrul et al. (2020), is time-consuming and subjective. Adaptive threshold,
gradient vector flow, adaptive snake, level set technique, expectation-maximization level
set, fuzzy based split, and merging algorithm were among the six segmentation methods
employed in the suggested system. The system’s performance is measured using four
[22].
developed by researchers to solve the computer vision problem more preciselywithin
by researchers to solve the computer vision problem more precisely within
obtaining the features of each image in the dataset [29]. The deep convolution neural
networks were pre-trained using Tensor Flow. Tensor Flow is a deep learning framework
developed by Google [30]. The structure of the full convolution neural network is described
in Figure 4.
Figure 4. Structure
4. Structure of convolution
of convolution neural
neural network
network [31].
Figure 4. Structure of convolution neural network [31].
The implementation
Figure 5.
5. Auto
encoder module
encoder forfor
module convolution
convolution neural
for convolution network
network [32].
neural network [32].
niques. Our stacking approach was compared to the performance of testing results in
processor with 12 GB RAM. A brief conceptual block diagram is illustrated in Figure 6.
Based on the functionality and basic architecture of stacked CV, the proposed block
diagram for the research is shown in Figure 8.
Diagnostics 2022, 12, 2472 8 of 15
Based on the functionality and basic architecture of stacked CV, the proposed block
diagram for the research is shown in Figure 8.
The proposed stacking-based classification method has three folds. The original
training data are passed to the level 1 model for classification such as deep learning. The
outcome of the classifier of deep learning becomes prediction 1, which becomes a feature
for the level 2 training data.
The level 2 training data are trained using support vector machine (SVM), neural
network (NN), random forest (RF), and K- nearest neighbor (KNN) classifiers. The outcome
of each classifier of the level 2 model is a prediction and it acts as a feature for the level 3
Figure 7.
The basic architecture of stacked CV algorithm [35].
The level 3 training data are passed to the level 3 model towards the classification.
Based on
the functionality
of the functionality
3 model isandthebasic
final architecture of used
prediction and stacked CV,outcome
as the the proposed
diagram for the outcome
final prediction research is shown
detects inclass
the Figure 8. cancer as a result.
of skin
regression. The comparative analysis is conducted with the proposed approach and other
traditional techniques used for the classification.
AUC provides the area under the ROC-curve integrated from (0, 0) to (1, 1). It gives
the aggregate measure of all possible classification thresholds. AUC has a range of 0 to 1. A
100% correct classified version will have the AUC value 1.0 and it will be 0.0 if there is a
100% wrong classification. The F1 score is calculated based on precision and recall. The
mathematical representation of precision and recall are explained below [38,39].
Precision checks how precise the model works by checking the correct true positives
from the predicted ones.
Precision =
Recall calculates how many actual true positives the model has captured, labeling
them as positives.
Recall = TPTP
+ FN
F1 = 2 × Precision+Recall
The accuracy is the most important performance measure. Accuracy determines how
many true positives TP, true negatives TN, false positives FP, and false negatives FN were
correctly classified [39–41].
Accuracy =
TP + TN + FP + FN
The sensitivity is the performance measure, and it is calculated as the number of
positive items correctly identified.
Sensitivity =
3. Experimental Analysis
The experiment is tested with three modes of the feature extraction: Resnet50, Xception,
and VGG 16. From the extracted feature the system is passed through the classification
mode of SVM, KNN, regression, AdaBoost, RF, decision tree, and GaussianNB. The system
is tested with our proposed stacking approach which is a hybrid combination of the
proposed model. This proposed approach aims to improve the classification performance
of the system. This research splits 70% of the dataset as a training set, 15% as a validation
set, and 15% as the testing set to evaluate the performance. For the evaluation of the
performance of the system, the accuracy, F1 score, sensitivity, and area under ROC (AUC)
metrics are used. The numerical outcome of the Resnet50 features with the performance
evaluation metrics is described in Table 2. The graphical representation of the comparative
performance of Resnet50 features with a given classification approach is shown in Figure 9.
Accuracy (%)
Figure 9. Performance of the
results of the
resultsofof the
the Xception
Xception features
features with
with thethe performance
performance evaluation
evaluation met-
are are described in Table
in Table 3. The3. The graphical
graphical representation
representation of comparative
of the the comparative performance
performance of
of Xception
Xception features
features withwith a given
a given classification
classification approach
approach is shown
is shown in Figure
in Figure 10. 10.
Table 3. Performance evaluation
features. 12 of 15
The experimental outcome results of the VGG16 features with the performance eval-
uation metrics are described in Table 4. The graphical representation of the comparative
The experimental
performance outcomewith
of VGG16 features results of theclassification
a given VGG16 features with is
approach theshown
in Figureeval-
Table of VGG16
4. Performance features
evaluation with
of the a given
system classification
for VGG16 approach is shown in Figure
feature extraction.
Classifier Accuracy (%) F1-Score Sensitivity AUC
Table 4. Performance evaluation
StackingCV (Proposed) of the system
86.5 for VGG16
0.842 feature extraction.
0.804 0.843
SVM Accuracy
86.7 (%) F1-Score
0.835 Sensitivity
0.810 AUC
StackingCV (Proposed)
Regression 87.586.5 0.842
0.847 0.804
0.844 0.843
SVM 86.7 0.835 0.810 0.859
KNN 81 0.761 0.733 0.799
Regression 87.5 0.847 0.844 0.870
KNN 79.981 0.766
0.761 0.798
0.733 0.799
RF 8479.9 0.766
0.805 0.798
0.798 0.799
DecisionTree 76.184 0.805
0.701 0.798
0.678 0.834
DecisionTree 76.1 0.701 0.678 0.749
GaussianNB 77.6 0.723 0.706 0.766
GaussianNB 77.6 0.723 0.706 0.766
Accuracy (%)
Figure 11.Performance
The comparative performance of the system with all classification approaches is cal-
culated in Table 5. The graphical representation of the comparative performance of the
system is shown in Figure 12.
The ROC curve and area under the ROC curve is the most prominent results for the
performance evaluation. The graphical representation of the ROC curve of this research is
shown in Figure 13.
The performance testing based on the proposed model was utilized by the researcher.
In this analysis, the classification of malignant and benign cancer was performed using the
stacking CV model implemented using a deep learning approach. The experiment was
tested in a three-fold training mechanism. The original dataset was trained using a deep
learning approach. The output of deep learning became a feature set for the level 2 model
such as with SVM, RF, NN, and KNN techniques. The second level utilized the prediction
of the previous classifier as output and processed the prediction. The prediction was the
third level model in the stacking CV algorithm and was extracted based on the previous
level output. For the proposed approach, stacking CV on the Xception feature extraction
mode proved dominant and promising, with 90.9% accuracy.
Resnet50 features
Xception Features
VGG16 features
Figure 12.
Figure 12. Comparative
Comparative performance
performance of
of the
the proposed
proposed and
and available
available classification
classification approaches.
Author Contributions: Data curation, A.B.; Formal analysis, Z.A.A.A.; Funding acquisition, N.S.S.;
Investigation, A.B. and Z.A.A.A.; Methodology, A.B. and A.B.A.; Project administration, Z.A.A.A.;
Resources, A.B.A. and N.S.S.; Software, A.B.A.; Supervision, Z.A.A.A. and N.S.S.; Visualization,
H.J.M.; Writing—original draft, A.B. and A.B.A.; Writing—review & editing, Z.A.A.A., N.S.S. and
H.J.M. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Universiti Kebangsaan Malaysia (Grant code: GUP2019-060).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available in article.
Conflicts of Interest: The authors declare no conflict of interest.
