COVID-19 Prediction and Detection Using Deep Learning: Article
COVID-19 Prediction and Detection Using Deep Learning: Article
net/publication/341980921
CITATIONS READS
4 12,163
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Moutaz Alazab on 12 June 2020.
Abstract: Currently, the detection of coronavirus disease 2019 vaccine for combating it. COVID-19 propagation is faster
(COVID-19) is one of the main challenges in the world, given the when people are in close proximity. Thus, travel restrictions
rapid spread of the disease. Recent statistics indicate that the control the spread of the disease, and frequent hand washing
number of people diagnosed with COVID-19 is increasing is always recommended to prevent potential viral infections.
exponentially, with more than 1.6 million confirmed cases; the Meanwhile, fever and cough are the most common infection
disease is spreading to many countries across the world. In this
study, we analyse the incidence of COVID-19 distribution across
symptoms. Other symptoms may occur, including chest
the world. We present an artificial-intelligence technique based discomfort, sputum development, and a sore throat. COVID-
on a deep convolutional neural network (CNN) to detect COVID- 19 may progress to viral pneumonia which has a 5.8%
19 patients using real-world datasets. Our system examines chest mortality risk. The death rate of COVID-19 is equivalent to 5%
X-ray images to identify such patients. Our findings indicate that of the death rate of the 1918 Spanish flu pandemic.
such an analysis is valuable in COVID-19 diagnosis as X-rays are The total number of people infected with COVID-19
conveniently available quickly and at low costs. Empirical worldwide is 5,790,103 as of May 27, 2020 whereas the
findings obtained from 1000 X-ray images of real patients numbers of reported deaths and recoveries are 357,432 and
confirmed that our proposed system is useful in detecting 2,497,618 respectively. Most of the cases were recorded in the
COVID-19 and achieves an F-measure range of 95–99%.
Additionally, three forecasting methods—the prophet algorithm
USA, Spain, Italy, France, Germany, mainland China, UK,
(PA), autoregressive integrated moving average (ARIMA) model, and Iran [2]. Saudi Arabia, with 78,541 cases, has the highest
and long short-term memory neural network (LSTM)—were number of reported cases among all the Arab countries.
adopted to predict the numbers of COVID-19 confirmations, Meanwhile, the number of reported cases in Jordan is 720,
recoveries, and deaths over the next 7 days. The prediction whereas the numbers of deaths and recoveries are 9 and 586
results exhibit promising performance and offer an average respectively. The number of reported cases in Australia is
accuracy of 94.80% and 88.43% in Australia and Jordan, 7150, whereas the numbers of deaths and recoveries are 103
respectively. Our proposed system can significantly help identify and 6579, respectively. Since February 2020, information
the most infected cities, and it has revealed that coastal areas are technology services, such as mobile apps, have been used to
heavily impacted by the COVID-19 spread as the number of
cases is significantly higher in those areas than in non-coastal
curb the potential risk of infection in mainland China. The
areas. mobile apps suggest users to self-quarantine and alert the
Keywords: Artificial Intelligence, X-ray, Convolutional Neural concerned health authorities when someone infected by the
Network, Machine Learning, COVID-19. virus. They also monitor infected people, and the last persons
that they had contact with [3].
I. Introduction Since it was first reported, the disease has spread
exponentially across the world and has become an
The coronavirus disease (COVID-19) is a global pandemic international concern. A research conducted by Jiang et al. [4]
that was discovered by a Chinese physician in Wuhan, the revealed that the death rate of COVID-19 is 4.5% across the
capital city of Hubei province in mainland China, in world. The death rate of patients in the age range of 70–79
December 2019 [1]. Currently, there is no approved human years is 8.0%, whereas that of patients above 80 years is
MIR Labs, USA
COVID-19 Prediction and Detection Using Deep Learning 169
14.8%. The authors also confirmed that patients above the age Empirical findings obtained from 1000 chest X-ray images
of 50 years with chronic illnesses are at the highest risk and of patients confirmed that our proposed system can detect
should therefore take special precautions. One of the main COVID-19 patients with an accuracy of 95–99%.
threats of COVID-19 is its rapid propagation, with an We provide an intelligent prediction system for predicting
estimated 1.5–3.5 people getting infected by the disease upon the number of patients confirmed to have contracted the
contact with an infected person [5]. This implies that if 10 disease, recovered from the disease, and died from the disease
people are COVID-19 positive, they are more likely to infect over the next 7 days using three forecasting methods. Our
15–35 other people. Therefore, COVID-19 can infect a very proposed system has been trained and tested on datasets
large number of people in a few days unless intervention generated from real-world cases and has predicted the
measures are implemented.
numbers of COVID-19 confirmations, recoveries, and deaths
The standard diagnostic technique is the reverse
in Australia and Jordan with an average accuracy of 94.80%
transcription-polymerase chain reaction (RT-PCR) method
and 88.43%, respectively.
[1], a laboratory procedure that interacts with other
ribonucleic (RNA) and deoxyribonucleic acids (DNA) to We highlight the most affected areas and show that coastal
determine the volume of specific ribonucleic acids using areas are heavily impacted by COVID-19 infection and spread
fluorescence. RT-PCR tests are performed on clinical research as the number of cases in those areas is significantly higher
samples of nasal secretions. The samples are collected by than that in other non-coastal areas.
inserting a swab into the nostril and gently moving it into the The rest of this paper is organised as follows. Section 2
nasopharynx to collect secretions. Although RT-PCR can presents the related works on recent COVID-19 detection and
identify the severe acute respiratory syndrome coronavirus 2 prediction methods for chest X-ray images. Section 3 presents
(SARS-CoV-2) strain that causes COVID-19, in some cases, the detailed system design, dataset description, and
it produced negative test results even though the patients performance-evaluation metrics. Sections 4 and 5 present the
showed progression on follow-up chest computed tomography results and discussions, respectively. Section 6 concludes the
(CT) scans [6]. In fact, several studies [6-9] have recommend paper and provides an outlook to future research.
the use of CT scans and X-rays rather than RT-PCR owing to
its limited availability in some countries. II. Related Works
The detection of COVID-19 symptoms in the lower parts of
The analysis and detection of COVID-19 have been
the lungs has a higher accuracy when using CT scans or X-
extensively investigated in the last few months. The first part
rays than that when using RT-PCR [7]. In certain cases, CT
of this section addresses issues related to COVID-19 detection
scans and X-ray tests can be substituted with RT-PCR tests.
based on deep-learning approaches using CT scans and chest
However, they cannot exclusively address the problem owing
X-ray images. The second part reviews the related literatures
to the relatively limited number of radiologists, compared to
to assess future estimates of the number of COVID-19
new residents, and the high volume of re-examinations of
confirmations, recoveries, and deaths.
infected people who wish to know the progression of their
COVID-19 has now become a global pandemic owing to its
illness. To overcome the challenges of CT scans and X-rays
rapid spread. It is very challenging to detect exposed persons
and to assist radiologists, we need to improve the speed of the
because they do not show disease symptoms immediately.
procedure. This can be achieved by designing advanced
Thus, it is necessary to find a method of estimating the number
diagnostic systems that utilise artificial intelligence (AI) tools.
of potentially infected persons on a regular basis to adopt the
The aim is to reduce the time and effort required to perform
appropriate measures. AI can be used to examine a person for
CT scans and X-rays of COVID-19-positive patients and
evaluate the rate of disease development [7-9]. COVID-19 as an alternative to traditional time-consuming
and expensive methods. Although there are several studies on
Radiological imaging is considered an important screening
COVID-19, this study focused on the use of AI in forecasting
method for COVID-19 diagnosis [10]. Ai et al. [6]
COVID-19 cases and diagnosing patients for COVID-19
demonstrated the consistency of the radiological history of
infection through chest X-ray images.
COVID-19-related pneumonia with the clinical nature of the
Several research areas have implemented AI (e.g. disease
disease. When examined by CT scans, almost all COVID-19
diagnoses in healthcare) [11-13]. One of the main advantages
patients have exhibited similar features including ground-
of AI is that it can be implemented in a trained model to
glass opacities in the early stages and pulmonary
consolidation in the latter stages. In fact, the morphology and classify unseen images. In this study, AI was implemented to
detect whether a patient is positive for COVID-19 using their
peripheral lung distribution can be rounded [6]. AI can be used
chest X-ray image.
to initially evaluate a COVID-19 patient as an alternative
AI can also be used for forecasting (e.g., how the population
solution to traditional approaches that are time-consuming and
will increase over the next 5 years) through existing evidence.
labour-intensive. In this paper, we advocate the use of AI to
forecast COVID-19 cases and diagnose COVID-19 patients Thus, predicting possibilities in the immediate future can help
authorities to adopt the necessary measures [14]. Wynants et
via chest X-ray images.
al. [15] focused on two main concepts. The first concept
A. Contributions of This Study involved studies related to the diagnosis of COVID-19, and
The following are the core contributions of this study: the second involved studies related to the prediction of the
We propose an automated intelligent system for number of people who will be infected in the coming days.
distinguishing COVID-19 patients from non-patients on the The study analysis maintained that most of the existing
basis of chest X-ray images. Our system instantly reads the models are poor and biased. The authors suggested that
structure of a chest X-ray image, leverages hidden patterns to research-based COVID-19 data should be publicly available
identify COVID-19 patients, and reduces the need for manual to encourage the adoption of more specifically designed
detection and prediction models.
pre-processing steps.
170 Alazab et al.
tuberculosis). The authors collected a dataset of 60,427 CT would be urgently required to stop the disease from spreading.
scans from 918 patients; 14,944 of these CT scans were from Although the prediction of COVID-19 cases for the USA was
150 COVID-19 patients and 15,133 from 154 non-COVID-19 1 million between 8 April and 30 April 2020, it reached
viral pneumonia patients. They performed several tests for 677,570 on 17 April 2020. Furthermore, Italy had 168,941
several lung diseases. The achieved accuracy, sensitivity, and cases, although it was predicted to have 300,000 cases [2].
specificity were 98.8%, 98.2%, and 98.9%, respectively. Huang et al. [45] applied a CNN to a limited dataset, which
Xu et al. [42] reported that real-time RT-PCR has a low was not specifically defined in their study, to evaluate and
positive rate at the early stage of COVID-19. They developed estimate the number of reported cases in China. The authors
an early screening model that uses deep-learning techniques used the mean absolute and root mean square errors to
for distinguishing COVID-19 pneumonia from influenza (a compare their model with other deep-learning models,
viral pneumonia) and stable cases using pulmonary CT images. including multilayer perceptron, long short-term memory
A dataset of 618 CT samples was obtained for the analysis, (LSTM), and gated recurrent units. The authors concluded that
and the images were classified as COVID-19, influenza (a the obtained results promise a high predictive efficiency.
viral pneumonia), and other cases using ResNet-18 and Pandey et al. [46] utilised two statistical algorithms—the
ResNet-based methods. The authors employed a noisy or susceptible-exposed-infectious-recovered (SEIR) and
Bayesian function to differentiate the infected images and regression models—to evaluate and forecast the distribution
obtained a detection accuracy of 86.7%. of COVID-19 in India. They used a dataset retrieved from the
John Hopkins University repository. The prediction results
2) COVID-19 Infection Prediction Using Machine
from the SEIR and regression models showed that the number
Learning Techniques
of confirmed COVID-19 cases would reach 5300 and 6153
ML is the science of training machines using mathematical cases, respectively, by 13 April 2020. However, the confirmed
models to learn and analyse data. Once ML is implemented in cases in India on that date were 10,453 and 6153 for the SEIR
a system, the data are analysed, and interesting patterns are and regression models, respectively [2].
detected. The validation data are then categorised according
to the patterns learned during the learning process. As III. System Design
COVID-19 infection has rapidly spread worldwide and
international action is required, it is important to develop a Our proposed deep learning-based COVID-19 detection
strategy to estimate the number of potentially infected people comprises several phases, as illustrated in Figure 1. The
on a regular basis to adopt the appropriate measures. Currently, phases are summarised in the following five steps:
decision-makers rely on certain decision-making statistics Step 1: Collect the chest X-ray images for the dataset from
such as imposing lockdowns on infected cities or countries. COVID-19 patients and healthy persons.
Therefore, ML can be used to predict the behaviours of new Step 2: Generate 1000 chest X-ray images using data
cases to stop the disease from spreading. augmentation.
Li et al. [43] developed a prediction model using ML Step 3: Represent the images in a feature space and apply deep
algorithms to combat COVID-19 in mainland China and in learning.
other infected countries in the world. The authors developed a Step 4: Split the dataset into two sets: a training set and a
model to estimate the number of reported cases and deaths in validation set.
mainland China and in the world. The data used to build the Step 5: Evaluate the performance of the detector on the
models were collected between 20 January 2020 and 1 March validation dataset.
2020. They estimated that the peak of the COVID-19 outbreak A. Dataset
in mainland China occurred on 22 February 2020 and on 10
Two types of datasets were used in the evaluation, the original
April 2020 worldwide. The authors also stated that COVID-
19 would be controlled at the beginning of April 2020 in dataset (without augmentation) and the augmented dataset,
mainland China and in mid-June 2020 across the world. They which are summarised in Tables 1 and 2, respectively. The
concluded that the estimated number of COVID-19 cases dataset contained the following: a) a healthy dataset
would be approximately 89,000 in China and 403,000 containing chest X-ray images of healthy persons and b) a
worldwide during the outbreak. As of 17 April 2020, the COVID-19 dataset containing chest X-ray images of COVID-
estimated number of deaths was 4000 in mainland China and 19 patients. The original dataset was obtained from the Kaggle
18,300 worldwide. It is clear that their forecast was similar to database, and its total number of images is 128, as presented
the actual situation in China as the total numbers of infected in Table 1 [47].
cases and deaths had exceeded 82,367 and 3342, respectively.
However, the number of confirmed cases worldwide exceeded
their estimations as the numbers of infected cases and deaths
exceeded 2 million and reached 145,416 as of 17 April 2020,
respectively [2].
Kumar et al. [44] predicted the COVID-19 spread in the 15
most-infected countries in the world using the autoregressive
integrated moving average (ARIMA) model. The outcome of
their prediction indicates that circumstances would worsen in
Iran and Europe, especially in Italy, Spain, and France.
Moreover, their prediction indicated that the number of cases
in South Korea and mainland China would become more
stable. The study also indicated that COVID-19 would spread
exponentially in the USA and that strict official measures Figure 1. Architecture of the proposed system
172 Alazab et al.
Owing to the limited availability of chest X-ray images, we true-negative (TN), false-positive (FP), and false-negative
generated our dataset using data augmentation [48]. Data (FN) scores:
augmentation is an AI method for increasing the size and the - TP is the proportion of positive COVID-19 chest X-
diversity of labelled training sets by generating different ray images that were correctly labelled as positive.
iterations of the samples in a dataset. Data augmentation - FP is the proportion of negative (healthy) COVID-19
methods are commonly used in ML to address class imbalance chest X-ray images that were mislabelled as positive.
problems, reduce overfitting in deep learning, and improve - TN is the proportion of negative (healthy) chest X-
convergence, which ultimately contributes to better results. ray images that were correctly labelled as healthy.
The total number of images in the dataset became 1000 after - FN is the proportion of positive COVID-19 chest X-
applying augmentation, as presented in Table 2 [47]. ray images that were mislabelled as negative
(healthy).
Table 1 Original dataset (without augmentation) Accuracy: This metric measures the percentage of
X-ray images Number correctly identified cases relative to the entire dataset. The
Healthy 28 ML algorithm performs better if the accuracy is higher.
COVID-19 70 Accuracy is a significant measure for a test dataset that
Total 128 includes a balanced class. It is computed as follows:
𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = (𝑇𝑃
Table 2 Augmented dataset + 𝑇𝑁)/ (𝑇𝑃
(1)
X-ray images Number + 𝑇𝑁 + 𝐹𝑃
Healthy 500 + 𝐹𝑁)
COVID-19 500 Precision: This metric is a measure of exactness, which is
Total 1000 calculated as the percentage of positive predictions of
B. Environment COVID-19 that were true positives divided by the number
of predicted positives. It is computed as follows:
A computer with Microsoft Windows 10 was used for the
experiment. It has the following specifications: Intel Core i7- 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃) (2)
8565U 1.80-GHz processor, 16 GB of DDR4 RAM, and 1 TB Recall: This metric is a measure of completeness, which is
of hard disk. We installed the virtual machine tool VMware calculated as the percentage of positives that were
Workstation Pro version 14.1.8 build-14921873 on it. Then, correctly identified as true positives divided by the number
we installed Ubuntu 18.04.4 (64 bit) on the virtual machine of actual positives. It is computed as follows:
and the following libraries and software:
ARIMA: 𝑹𝒆𝒄𝒂𝒍𝒍 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑁) (3)
https://www.statsmodels.org/stable/generated/statsmodels.ts F-measure: This is a combination of precision and recall
a.arima_model.ARIMA.html that provides a significant measure for a test dataset that
Fbprophet: https://pypi.org/project/fbprophet/ includes an imbalanced class. It is computed as follows:
ImageDataGenerator: https://keras.io/preprocessing/image/ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑅𝑒𝑐𝑎𝑙𝑙
𝑭 − 𝑴𝒆𝒂𝒔𝒖𝒓𝒆 = 𝟐 ( )
Keras: https://keras.io/ (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) (4)
LSTM: Root Mean Square Error (RMSE): This metric
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LS measures the differences between the actual (𝑥𝑖 ) and the
TM
predicted ( 𝑥̂𝑖 ) numbers of COVID-19 confirmations,
Matplotlib: https://matplotlib.org/
NumPy: https://numpy.org/ recoveries, and deaths (𝑁). The main advantage of RMSE
Pandas: https://pandas.pydata.org/ is that it penalises large prediction errors. RMSE was used
to compare the prediction errors of the three prediction
Python: https://www.python.org/
algorithms. It is defined as follows:
Scikit: https://scikit-learn.org/ 𝑁
SciPy: https://www.scipy.org/ 1
𝑹𝑴𝑺𝑬 = √ ∑(𝑥𝑖 − 𝑥̂𝑖 )2 (5)
TensorFlow: https://www.tensorflow.org/ 𝑁
𝑖=1
All the results and predictions made in this study have been Correlation coefficient: This metric is often used to
uploaded to the Kaggle database [49, 50]. We believe that by evaluate the performance of a prediction algorithm. It is
making the system and solution publicly available, we draw defined as follows:
attention to the most affected areas, thereby preventing the
𝑁
spread of the COVID-19 outbreak and fostering the use of 1
deep-learning techniques in COVID-19 research. 𝑪𝑪 = (1 − ∑|𝑥𝑖 − 𝑥̂𝑖 |) ∗ 100% (6)
𝑁
𝑖=1
C. Evaluation Metrics
To assess the reliability of the proposed deep learning-based
COVID-19 detector, we adopted the same metrics as those
IV. Experimental Results
used by Alazab et al. [51-54] and considered the following Firstly, we examined the most infected areas across the world.
standard metrics: precision, recall, and F-measure. These In Section 4.1, we show that coastal areas are heavily affected
metrics are calculated on the basis of the true-positive (TP), by the COVID-19 outbreak as the number of cases in those
COVID-19 Prediction and Detection Using Deep Learning 173
In the USA, there were more than 1,745,843 confirmed observations and the residual error values by using the moving
cases on 27 May 2020. The first case was found in Oregon, average for the lagged observations. ARIMA uses the order
which is located in the Pacific Coast. Coastal states, including factors p, d, and q, where p is the order of the autoregressive
Washington, Oregon, California, Arizona, and Texas, model, d is the order of the differencing, and q is the order of
reported high numbers of confirmed cases. Furthermore, the moving average. The algorithm is computed as follows:
states including Wisconsin and Illinois with long lake ̂ 𝑝
𝑦 = 𝑐 + 𝜖𝑡 + ∑ ∅𝑖 𝑦𝑡−𝑖
coastlines also reported confirmed cases at the initial stage of 𝑖=1
(8)
the COVID-19 spread. Other eastern coastal states including ̂ 𝑞
+ 𝑦 = 𝑐 + 𝜖𝑡 + ∑ 𝜃𝑖 𝜖𝑡−1
New York, Maine, New Hampshire, Massachusetts, Rhode 𝑖=1
Island, Connecticut, New Jersey, Delaware, Maryland, where 𝜖𝑡 is an independent and homogeneously distributed
Virginia, North Carolina, South Carolina, Georgia, Florida, error term, 𝑐 is a constant term, 𝑦 is an actual value at time 𝑡,
and Indiana also reported high numbers of confirmed cases, as and ϕ and θ are the tuning parameters of the autoregressive
well as other coastal areas such as Colorado and Nebraska. and moving-average models, respectively.
Thus, most of the states that reported the highest numbers of LSTM is a form of a recurrent neural network (RNN) that
cases are located in the coastal regions. Figure 6 highlights the memorises earlier patterns in data sequences. It was originally
coastal regions with the highest number of cases in maroon. proposed by Hochreiter and Schmidhuber [62]. It replaces the
hidden layer neurons of the RNN with a series of memory cells.
The key is the state of the memory cell that filters data using
a gate structure that updates the state of the memory cell. It
includes the input, forgotten, and output gates for its gate
structure. Each cell has three sigmoid layers and one tanh layer,
as shown in Figure 7 [61].
(iii) The output gate controls how much of the current cell
state is discarded. The data are determined by a sigmoid layer.
The cell state is processed by the tanh layer and multiplied by
the output retrieved from the sigmoid layer to obtain the final
output of the cell, as shown below:
ℎ𝑡 = 𝜎 (𝑊𝜎 . [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑜 ) ∗ tanh(𝐶𝑡 ) (13)
To assess the performance of the implemented forecasting
Date
Figure 11. Predicted COVID-19 confirmed cases in
Jordan
Predicted recovered cases
Date
Figure 8. Predicted COVID-19 confirmed cases in Australia
Date
Figure 12. Predicted COVID-19 recovered cases in Jordan
Predicted recovered cases
Date Date
Figure 9. Predicted COVID-19 recovered cases in Australia Figure 13. Predicted COVID-19 death cases in Jordan
176 Alazab et al.
V. Discussions
This study provided a forecasting analysis of COVID-19
confirmations, recoveries, and deaths in Australia and Jordan.
It further implemented a CNN-based COVID-19 detector to
identify COVID-19 infections using X-ray images. Based on
the study results, the following conclusions were drawn:
PA delivered the best performance for COVID-19
prediction over 7 days, compared to LSTM and ARIMA.
The predictions will enable people in both countries to
predict their medical needs for tackling the spread of COVID-
19.
ARIMA cannot make predictions over the next 1, 2, and 3
days.
After investigating the number of COVID-19 confirmations,
recoveries, and deaths in various countries, we found that
coastal areas are significantly impacted by the disease because
Figure 17. Augmented chest X-ray images for COVID-19 the numbers of cases in those areas are significantly higher
patients than those in other non-coastal areas. This observation is
medically consistent with the propagation capability of
viruses in areas with higher humidity rates. Thus, the authors
advise healthcare professionals to devote greater attention to
coastal regions.
The use of chest X-ray images is recommended for
diagnosing COVID-19 because X-rays are easily obtained at
nearby hospitals or clinics fairly quickly and at low costs.
Our CNN-based COVID-19 detector delivered superior
performance in terms of precision, recall, and F-measure.
The application of ML techniques for COVID-19 diagnosis
using our CNN-based COVID-19 detector is recommended.
It is well known that VGG16 (Wu et al., 2017) outperforms
many convolutional networks, such as GoogLeNet and
SqueezeNet, and its feature representation capability is
beneficial for classification accuracy. Hence, VGG16 is a
recommended version of a deep CNN-based algorithm as it
makes training easier and quicker. It was implemented in our
Figure 18. Augmented chest X-ray images for healthy people COVID-19 detector to improve its accuracy in diagnosing
COVID-19 in chest X-ray images.
The CNN-based COVID-19 detector trained on an un- Our COVID-19 detector obtained better results when using
augmented dataset achieved a weighted average F-measure of augmentation. A better training process was achieved as the
95%. The same COVID-19 detector achieved a weighted gap between the training and validation became smaller.
average F-measure of 99% when trained on an augmented Moreover, a more generalized and robust COVID-19 detector
dataset, as shown in Figure 19. Hence, the COVID-19 detector was achieved as the F-measure improved from 0.95 to 0.99.
exhibits superior performance metrics in terms of recall, Thus, the COVID-19 detector trained on augmented data
precision, and F-measure when trained on augmented data. It provides superior performance metrics and is robust for
is therefore sufficiently robust and helpful for rapidly
diagnosing COVID-19 in chest X-ray images.
diagnosing a large number of suspected COVID-19 patients.
178 Alazab et al.
[17] [M. Alazab, S. Venkatraman, P. Watters, M. Alazab, [30] A. Mesleh, "Feature sub-set selection metrics for
and A. Alazab, "Cybercrime: The Case of Arabic text classification," Pattern Recognition
Obfuscated Malware," in Global Security, Safety and Letters, vol. 32, pp. 1922-1929, 2011.
Sustainability & e-Democracy. vol. 99, C. [31] A. Mesleh, "Support Vector Machine Text Classifier
Georgiadis, H. Jahankhani, E. Pimenidis, R. for Arabic Articles," ed: VDM Verlag Dr. Müller,
Bashroush, and A. Al-Nemrat, Eds., ed: Springer 2010.
Berlin Heidelberg, 2012, pp. 204-211. [32] K. Suzuki, "Overview of deep learning in medical
[18] M. Alazab., S. Venkatraman., P. Watters., and M. imaging," Radiological physics and technology, vol.
Alazab., "Information Security Governance: The Art 10, pp. 257-273, 2017.
of Detecting Hidden Malware," in IT Security [33] Y. LeCun, Y. Bengio, and G. Hinton, "Deep
Governance Innovations: Theory and Research, M. learning," nature, vol. 521, pp. 436-444, 2015.
Daniel, S. Luis Enrique, F.-M. Eduardo, and G. P. [34] C. Rachna. (2020, 15 April 2020). Difference
Mario, Eds., ed Hershey, PA, USA: IGI Global, 2013, Between X-ray and CT Scan.
pp. 293-315. [35] P. K. Sethy and S. K. Behera, "Detection of
[19] A. Alazab, M. Alazab, J. Abawajy, and M. Hobbs, coronavirus Disease (COVID-19) based on Deep
"Web application protection against SQL injection Features," 2020.
attack," in Proceedings of the 7th International [36] E. E.-D. Hemdan, M. A. Shouman, and M. E. Karar,
Conference on Information Technology and "A Framework of Deep Learning Classifiers to
Applications, 2011, pp. 1-7. Diagnose COVID-19 in X-Ray Images.," arXiv
[20] M. Alazab and L. Batten, "Survey in Smartphone preprint arXiv:2003.11055, 2020.
Malware Analysis Techniques," in New Threats and [37] A. E. Hassanien, L. N. Mahdy, K. A. Ezzat, H. H.
Countermeasures in Digital Crime and Cyber Elmousalami, and H. A. Ella, "Automatic X-ray
Terrorism, ed: IGI Global, 2015, pp. 105-130. COVID-19 Lung Image Classification System based
[21] M. Alazab., A. Alazab., and L. Batten., "Smartphone on Multi-Level Thresholding and Support Vector
malware based on synchronisation vulnerabilities," Machine," medRxiv, 2020.
in ICITA 2011: Proceedings of the 7th International [38] S. Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo,
Conference on Information Technology and et al., "A deep learning algorithm using CT images
Applications, Sydney, Australia, 2012, pp. 1-6. to screen for Corona Virus Disease (COVID-19),"
[22] V. Moonsamy., M. Alazab., and L. Batten., medRxiv, 2020.
"Towards an Understanding of the Impact of [39] D. Wang, B. Hu, C. Hu, F. Zhu, X. Liu, J. Zhang, et
Advertising on Data Leaks," International Journal of al., "Clinical characteristics of 138 hospitalized
Security and Networks (IJSN), vol. 7 2012. patients with 2019 novel coronavirus–infected
[23] L. M. Batten, V. Moonsamy, and M. Alazab, pneumonia in Wuhan, China," Jama, 2020.
"Smartphone applications, malware and data theft," [40] O. Gozes, M. Frid-Adar, H. Greenspan, P. D.
in Computational intelligence, cyber security and Browning, H. Zhang, W. Ji, et al., "Rapid ai
computational models, ed: Springer, 2016, pp. 15-24. development cycle for the coronavirus (covid-19)
[24] M. Alazab, V. Monsamy, L. Batten, P. Lantz, and R. pandemic: Initial results for automated detection &
Tian, "Analysis of Malicious and Benign Android patient monitoring using deep learning ct image
Applications," in International Conference on analysis," arXiv preprint arXiv:2003.05037, 2020.
Distributed Computing Systems Workshops [41] M. Fu, S.-L. Yi, Y. Zeng, F. Ye, Y. Li, X. Dong, et
(ICDCSW), 2012 32nd, 2012, pp. 608-616. al., "Deep Learning-Based Recognizing COVID-19
[25] Y. Xu, Y. Wang, J. Yuan, Q. Cheng, X. Wang, and and other Common Infectious Diseases of the Lung
P. L. Carson, "Medical breast ultrasound image by Chest CT Scan Images," medRxiv, 2020.
segmentation by machine learning," Ultrasonics, vol. [42] X. Xu, X. Jiang, C. Ma, P. Du, X. Li, S. Lv, et al.,
91, pp. 1-9, 2019. "Deep learning system to screen coronavirus disease
[26] A. Mesleh, "Lung Cancer Detection Using Multi- 2019 pneumonia," arXiv preprint arXiv:2002.09334,
Layer Neural Networks with Independent 2020.
Component Analysis: A Comparative Study of [43] M. Li, Z. Zhang, S. Jiang, Q. Liu, C. Chen, Y. Zhang,
Training Algorithms," Jordan Journal of Biological et al., "Predicting the epidemic trend of COVID-19
Sciences, vol. 10, 2017. in China and across the world using the machine
[27] A. Mesleh, D. Skopin, S. Baglikov, and A. Quteishat, learning approach," medRxiv, 2020.
"Heart rate extraction from vowel speech signals," [44] P. Kumar, H. Kalita, S. Patairiya, Y. D. Sharma, C.
Journal of computer science and technology, vol. 27, Nanda, M. Rani, et al., "Forecasting the dynamics of
pp. 1243-1251, 2012. COVID-19 Pandemic in Top 15 countries in April
[28] A. Mesleh, "Chi square feature extraction based 2020 through ARIMA Model with Machine
svms arabic language text categorization system," Learning Approach," medRxiv, 2020.
Journal of Computer Science, vol. 3, pp. 430-435, [45] C.-J. Huang, Y.-H. Chen, Y. Ma, and P.-H. Kuo,
2007. "Multiple-Input Deep Convolutional Neural
[29] A. Mesleh, "Support vector machines based Arabic Network Model for COVID-19 Forecasting in
language text classification system: feature selection China," medRxiv, 2020.
comparative study," in Advances in Computer and [46] G. Pandey, P. Chaudhary, R. Gupta, and S. Pal,
Information Sciences and Engineering, ed: Springer, "SEIR and Regression Model based COVID-19
2008, pp. 11-16. outbreak predictions in India," arXiv preprint
arXiv:2004.00958, 2020.
180 Alazab et al.
[47] N. Sajid. (2020, April.1). Corona Virus Dataset [61] J. Qiu, B. Wang, and C. Zhou, "Forecasting stock
Available: prices with long-short term memory neural network
https://www.kaggle.com/nabeelsajid917/covid-19- based on attention mechanism," PloS one, vol. 15,
x-ray-10000-images 2020.
[48] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. [62] S. Hochreiter and J. Schmidhuber, "LSTM can solve
Parinov, M. Druzhinin, and A. A. Kalinin, hard long time lag problems," in Advances in neural
"Albumentations: fast and flexible image information processing systems, 1997, pp. 473-479.
augmentations," Information, vol. 11, p. 125, 2020. [63] F. Quiroga, R. Antonio, F. Ronchetti, L. C. Lanzarini,
[49] V. Jatana. (2020, April. 1). Coronavirus in Jordan. and A. Rosete, "A study of convolutional
Available: architectures for handshape recognition applied to
https://www.kaggle.com/vanshjatana/coronavirus- sign language," in XXIII Congreso Argentino de
in-jordan/notebook Ciencias de la Computación (La Plata, 2017). 2017.
[50] V. Jatana. (2020, April. 1). Coronavirus in Australia.
Available: Author Biographies
https://www.kaggle.com/vanshjatana/australia-
under-covid-19?scriptVersionId=32280319 Dr. Moutaz Alazab is a computer
[51] M. Alazab., M. Alazab., A. Shalaginov., A. Mesleh., security expert with industry, academic,
and A. Awajan., "Intelligent mobile malware teaching and research experience. He
detection using permission requests and API calls," completed his PhD degree in
Future Generation Computer Systems, vol. 107, pp. cybersecurity from Deakin university,
509-521, 2020. Australia in 2014. He is currently
[52] M. Alazab, "Automated Malware Detection in working as assistant professor in the
Mobile App Stores Based on Robust Feature faculty of Artificial Intelligence, Al-
Generation," Electronics, vol. 9, p. 435, 2020. Balqa Applied University. During his
[53] M. Alazab, "Analysis on Smartphone Devices for PhD, he was an active scholar in the Securing Cyberspace
Detection and Prevention of Malware," Doctor of Laboratory and Network and System Security Laboratory
Philosophy, Faculty of Science, Engineering and (NSCLab). Dr. Alazab has proved the ability in delivering a
Built Environment, Deakin University, 2014. high-quality content for several courses for both level
[54] M. Alazab, S. Venkatraman, P. Watters, and M. (undergraduate students and postgraduate students). He has
Alazab, "Zero-day malware detection based on lectured, coordinated, tutored and moderated in several well-
supervised learning algorithms of API call known universities including in BAU, CCQ, Deakin, RMIT,
signatures," in Ninth Australasian Data Mining CQU and MIT. Dr. Alazab has worked closely in
Conference: AusDM 2011, Ballarat, Australia, 2011, collaboration with industry on several research projects,
pp. 171-181. including BAE Systems, Microsoft and Ericsson. He is
[55] T. Zhang, Q. Wu, and Z. Zhang, "Probable pangolin recipient of number of research grants including research
origin of SARS-CoV-2 associated with the COVID- Incentive Fund (RIF), Zayed University. His research interests
19 outbreak," Current Biology, 2020. include Cybersecurity, Mobile Security, Network Security,
[56] CentersforDiseaseControlandPrevention. (2020, 01). Machine Learning, Digital Forensics, Blockchain, Internet of
Interim Clinical Guidance for Management of Things (IOT) and Big data analytics. He has published more
Patients with Confirmed Coronavirus Disease than 20 peer-reviewed articles in well-known, high-quality
(COVID-19). Available: international journals and conferences.
https://www.cdc.gov/coronavirus/2019-
ncov/hcp/clinical-guidance-management- Dr. Albara Awajan is an associate
patients.html professor at the Cyber Security
[57] AustralianGovernment. (2020, April. 01). Department, Faculty of Artificial
Coronavirus (COVID-19) current situation and case Intelligence, Al-Balqa Applied
numbers. Available: University. He completed his PhD in
https://www.health.gov.au/news/health-alerts/novel- Computer Networks and Multimedia
coronavirus-2019-ncov-health-alert/coronavirus- Applications in 2008 and his MCs
covid-19-current-situation-and-case-numbers Degree in 2003 in Multimedia and
[58] S. Kannan. (2020, April. 10). A drill-down analysis Internet Computing from the
of Covid-19 in India, so far. Available: University of Glamorgan in the UK. He completed his BCs
https://www.indiatoday.in/news-analysis/story/a- from Mutah University in 2001. During his PhD studies he
drill-down-analysis-of-covid-19-in-india-so-far- worked in the Mobile Communication Research Group in the
1665676-2020-04-10 Faculty of Advance Engineering. He is currently Head of the
[59] S. J. Taylor and B. Letham, "Forecasting at scale," Automated Systems Department and Assistant Dean for
The American Statistician, vol. 72, pp. 37-45, 2018. Student Affairs in the Artificial Intelligent Faculty. Dr Awajan
[60] A. Bazila Banu, R. Priyadarshini, and P. Research Areas are in AD-HOC networks, QoS, IoT and
Thirumalaikolundusubramanian, "Prediction of network Security.
Children Diabetes by Autoregressive Integrated
Moving Averages Model Using Big Data and Not
Only SQL," Journal of Computational and
Theoretical Nanoscience, vol. 16, pp. 3510-3513,
2019.
COVID-19 Prediction and Detection Using Deep Learning 181