Prerna Sharma 2020
Prerna Sharma 2020
Prerna Sharma 2020
A R T I C LE I N FO A B S T R A C T
Keywords: In today’s world, cardiovascular diseases are prevalent becoming the leading cause of death; more than half of
Modified artificial plant optimization algorithm the cardiovascular diseases are due to Coronary Heart Disease (CHD) which generates the demand of predicting
Machine learning them timely so that people can take precautions or treatment before it becomes fatal. For serving this purpose a
Savitzky-Golay filter Modified Artificial Plant Optimization (MAPO) algorithm has been proposed which can be used as an optimal
Extreme gradient boosting
feature selector along with other machine learning algorithms to predict the heart rate using the fingertip video
Artificial neural network
dataset which further predicts the presence or absence of Coronary Heart Disease in an individual at the mo-
ment. Initially, the video dataset has been pre-processed, noise is filtered and then MAPO is applied to predict
the heart rate with a Pearson correlation and Standard Error Estimate of 0.9541 and 2.418 respectively. The
predicted heart rate is used as a feature in other two datasets and MAPO is again applied to optimize the features
of both datasets. Different machine learning algorithms are then applied to the optimized dataset to predict
values for presence of current heart disease. The result shows that MAPO reduces the dimensionality to the most
significant information with comparable accuracies for different machine learning models with maximum di-
mensionality reduction of 81.25%. MAPO has been compared with other optimizers and outperforms them with
better accuracy.
⁎
Corresponding author.
E-mail addresses: prernasharma@mait.ac.in (P. Sharma), choudharykrish97@gmail.com (K. Choudhary), kgupta101097@gmail.com (K. Gupta),
chawlarahul1997@gmail.com (R. Chawla), deepakgupta@mait.ac.in (D. Gupta), arunsharma@igdtuw.ac.in (A. Sharma).
https://doi.org/10.1016/j.artmed.2019.101752
Received 4 June 2019; Received in revised form 30 October 2019; Accepted 2 November 2019
0933-3657/ © 2019 Elsevier B.V. All rights reserved.
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
results are clearly able to illustrate the advantages and disadvantages of relative errors less than 5%.
the PSO algorithm. Arpan Kumar Kar [9] vividly explained the theo- • It has achieved the Pearson Correlation and Standard Error of
retical concepts of different bio-inspired algorithms including the Estimate as 0.9541 and 2.418 respectively while detecting heart
APOA, in which he devised an algorithm that mimics the developing rate.
phenomenon of a plant and how it empowers photosynthesis for ad- • The results show that the proposed MAPO outperforms the Original
vancing growth and food synthesis. Xin-She Yang [10] proposed an- APO.
other algorithm roused by plant development process and devised
photosynthesis operator, phototropism operator, and apical dominance The rest of the paper is organized as follows. Section 2 covers the
operator. literature review of various papers. The methodology of the entire al-
But existing works stated above are not sufficient to be implemented gorithm has been introduced in Section 3 and its detailed im-
into real-life problems. A significant amount of theoretical as well as plementation has been discussed in Section 4. Section 5 presents the
practical work has been done in a few bio-inspired algorithms like results obtained followed by the Conclusion and Future Scope.
Neural Networks. On contrary, certain bio-inspired algorithms such as
APOA have limited amount of literature available. Studies in Scopus 2. Literature review
showed that different bio-inspired algorithms have varied literature
associated with them. Fig. 1 shows the percentage of names of different In this paper [15] Zhihua Cui, Dongmei Liu, Jianchao Zeng, and
bio-inspired algorithms present in the article titles in Scopus. Zhongzhi Shi devised a way to solve the protein folding problem which
It is quite evident that APO, in particular, has minimum amount of is a classical NP problem. They chose APOA) to achieve this. They chose
literature associated with it. Until now only the theoretical aspect of the APOA as it can easily overcome the problem proposed by the local
algorithm has been discussed in the papers lacking a proper im- optima. It has a greater probability to find the Global Optima, to en-
plementation and a complete algorithm. APO algorithms discussed till hance the efficiency of the algorithm APOA is applied with certain
now are not adaptable on varied datasets because of lack of proper splitting Strategies. The Fibonacci sequence with a real protein se-
implementation. Also, existing implementations of APOA have limited quence shows promising results.
scalability to real-life problem domains [11,12]. The paper [16] proposed by Dongmei Liu and Zhihua Cui [16] deals
To address this concern, Modified Artificial Plant Optimization with the protein folding structure prediction problem. They presented a
(MAPO) algorithm has been introduced in this paper. This paper not new hybrid form of artificial plant optimization algorithm which makes
only discusses theoretical aspects of APOA [14] but also focuses on the use of a golden section operator and a local search strategy popularly
implementation of complete algorithm. This paper proposes im- known as L-BFGS. After some tests being conducted with this algorithm
plementation of APO algorithm which can be efficiently applied to experiments show precise predictions of the protein folding structures.
tabular datasets as well as video datasets [13]. MAPO has been used to The research [17] done by Banitsas K., Pelegris P., Orbach T., Ca-
calculate the heart rate and then it has been intelligently combined with vouras D., Sidiropoulos K., and Kostopoulos S. is a solution that can take
other machine learning algorithms as a feature optimizer to detect heart care of a user’s well-being. The research done by them enables their
disease of a person. It has been designed in such a way that MAPO may users to check their own heartbeat rate with that they make use of
find its use in future to predict not only the heart rate of a person but software to coach users for their personal health. They did this research
also parameters like sugar levels, cholesterol levels, etc. by using just the camera of a mobile phone which gave them the re-
The paper focuses on the following: quired information they needed for computation.
The paper [18] presented by Enock Jonathan and Martin J. Leahy is
• ‘Modified Artificial Plant Optimization algorithm (MAPO)’ has been research on photoplethysmographic (PPG) imaging which is done using
proposed for the detection of heart rate using fingertip video. a mobile phone. They took computational values of 0.1 Hz as a sym-
• Logistic Regression, Naïve Bayes, XGB (Extreme Gradient Boosting) pathetic component for their calculated heart rate, the people with 1 Hz
and ANN (Artificial Neural Network) have been applied to Heart and 2 Hz as their heart rate were classified as healthy volunteers. The
Disease dataset to calculate presence of heart disease. results produced by their research were promising which could be quite
• The proposed MAPO shows better accuracies than other related helpful on a home-based care basis.
works with the highest number of videos (84 out of 100) which has This research [19] is carried by Deepshikha Acharya, Asha Rani,
2
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Shivangi Agarwal, Vijander Singh which is about the Savitzky-Golay classification can improve the effectiveness for a large number of topics
filter. The filter needs an order and a frame size to create it. Usually, a of RCV-1 test collection and according to them it may still provide ways
trivial method of hit and trial is applied or an experienced person who to improve effectiveness for other topics.
has used the filter previously and knows it’s working can come in The research [26] presented by Prayag Tiwary, Massimo Melucci is
handy. They proposed an algorithm to automatically devise the order about the quantum-inspired binary classifier based on quantum detec-
and frame size for the filter. It considers all the combinations of values tion theory. They used text corpora and image corpora to test the effect
and comes out with the most optimal structure for the filter. The cor- of their proposed model. Their model outperforms the state-of-the-art
relation coefficient is calculated for each of these combinations and its models in terms of precision, recall, F-measure. On the MNIST hand-
corresponding filter efficiency is measured. The parameter with the written dataset their model outperforms all the baselines in terms of
highest correlation coefficient is chosen for the filter. They also tested recall, F-measure is also higher for most of the categories, and precision
their adaptive filter for Electroencephalogram signal processing. is higher for some categories. This paper shows that quantum-inspired
This paper [20] by Mohamed Elgendi carries out a thorough ana- binary classifier can increase the precision, recall, and F-measure which
lysis of the Photoplethysmography (PPG) technique. This technique the state-of-the-art methods can't.
uses infrared light to estimate blood flow. PPG shows excellent results The research [27] proposed by Prayag Tiwary, Massimo Melucci is
in screening various atherosclerotic pathologies but to fully understand about the idea of replacing the current classification theory with the
different features for their diagnostic value is still not achieved. The improved quantum theory to improve the effectiveness of models. In
paper discusses in detail the PPG signal, its characteristic features, and this research they have extended their work from binary classifier to a
its diagnoses evaluation indexes. multiclass classifier using the quantum theory, which is basically clas-
The paper [21] written by Nicolas Rodondi1 discusses in detail the sification of data into more than two classes.
Framingham risk score and its alternatives to predict Coronary Heart In this research [28] whose presenters are Prayag Tiwary, Massimo
Disease (CHD) in Older Adults. They compared the efficiency of Fra- Melucci they have worked on using quantum theory in the field of
mingham risk score (FRS) in its prediction of CHD with refit function machine learning, they proposed a Binary Classifier Inspired by
that came from the present cohort after its recalibration. They also Quantum Theory (BCIQT) model which in practical outperforms the
analyzed the utility with the addition of some other risk parameters state-of-the-art classification techniques in terms of recall.
with FRS. The FRS alone not gave good results with as high as 51% of The paper [29] by Emanuele Di Buccio, Qiuchi Li, Massimo Melucci,
women who were underestimated to have risk and 8% men were also Prayag Tiwari is about the binary classification model inspired by
among them, though recalibrating the FRS improved the results espe- quantum theory. Quantum Mechanics actually made it possible to push
cially in case of women. This study made clear that FRS often under- the optimal bound of effectiveness beyond levels of the state-of-the-art
estimates risk of CHD in older adults, particularly in case of women. classification algorithms. They have proposed a binary classification
The re-estimated risk functions improved their prediction of absolute model inspired by quantum detection to explore the benefits it brings in
risk. comparison to classical models. According to their results improvement
The research [22] carried out by Nitika Sharma, Kriti Saroha dis- in classification effectiveness can be obtained, although they felt that
cusses various dimensionality reduction techniques in field of Data the potential of quantum detection can only be partially exploited.
mining. In various fields, people encounter the problem of "Curse of The research [30] conducted by Stephen Roy and Jean Mccrory was
Dimensionality" which leads to lower efficiency of their models. This to determine if the maximum heart rate (HRmax) was in any way af-
happens due to the presence of certain unimportant and insignificant fected by sex or aerobic training status. To conduct their research, they
features. So various methodologies are used for feature selection to took a sample size of 52 volunteers out of which 30 were in the active
filter out the important features from the irrelevant ones. They did a group (15 M, 15 F) and 22 were in sedentary group (9 M, 13 F). All of
comparative analysis for various feature reduction methods and sur- them were within the range of 18–25 and they had a normal BMI as
veyed for the best method that can be chosen based on the type of well. They were made to undergo a Bruce maximal treadmill exercise
dataset. protocol. They also analysed the effect of sex, aerobic training and
The paper [23] written by Esraa Elhariri, Nashwa El-Bendary, Aboul found that males had a higher HRmax than females which got them to a
Ella Hassanien is on the classification approaches based on Random conclusion that HRmax with equation HRmax = 208 - (0.7 ∙ age) had a
Forests (RF) and Linear Discriminant Analysis (LDA). The problem they greater accuracy than the other two equations, which they took in-
took is to classify different types of plants. Initially, they preprocessed itially, for predicting observed values of HRmax in 18 to 25-year olds.
the data in hand and then applied feature engineering techniques on The paper [31] by A Gupta, J F Lampropululos, B Bikdeli, P Mody, R
top of it and finally they did performed classification. The classification Chen, V T Kulkarni, K Dharmarajan is a review on Most Important
approach is mainly based on the leaves of the plants because of their Outcomes Research Papers on Cardiovascular Diseases (CVD) in
uniqueness. The total dataset consists of 340 leaf images. Their results women. They feel that CVD is understudied in the case of women and
showed that LDA reached 92.65% and RF reached 88.82% of accuracy CVD is only classified as a "Man's Disease". In order to increase the
respectively. awareness of CVD in women the American Heart Association (AHA)
The research [24] performed by Pei-Wei Tsai, Jeng-Shyang Pan, published their first women-specific clinical recommendations in 1999
Bin-Yih Liao,Ming-Jer Tsai, and Vaci Istanda was on Bat Algorithm for prevention of CVD. As a result of this and several other initiatives by
which is a bio-inspired algorithm. They modified that algorithm and AHA and other collaborative organisations the public awareness of CVD
called it the Evolved Bat Algorithm (EBA) which they used for solving increased from 30% to 54% since 1997–2009 amongst the US women.
the numerical optimization problem. They redefined certain operations Yet CVD remains the number 1 killer amongst women. Due to the in-
of the algorithm according to Bat’s behavior. Their experiments with crease in obesity epidemic increase in Coronary Heart Disease (CHD)
EBA resulted in a massive 99.42% improvement on the accuracy for death can be found among 35–54 years of age in women. In this paper
finding the near best solution and reduces 6.07% on average as com- they have included studies that have been provided by a convincing a
pared to the original Bat Algorithm. priori reason for the study of a particular disease or clinical intervention
The paper [25] proposed by Prayag Tiwary, Massimo Melucci steps in women and reported primary end-point that were sex-specific. Apart
a foot in the discussion of Quantum Mechanics to be used in Machine from this they have included many other topics specific to women in
Learning, they present a binary classification model inspired by their paper.
quantum detection framework. It is said that using Quantum-inspired The paper [32] presented by Dogan Ibrahim and Kadri Buruncuk is
Machine Leaning can enhance learning rate and effectiveness. Their on a low-cost microcontroller device that can measure the Heart rate
results show that the use of quantum detection framework for binary with an LCD output. They have used optical sensors to measure the
3
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Heart rate of a person from the finger and the rate after averaging can Fitotal
Fi (t ) = . (gi (t ) − gp (t ))
be seen on the LCD. ‖gi (t ) − gp (t )‖ (4)
The research [33] carried out by Rachel Huxley, Federica Barzi,
Mark Woodward is about estimating the relative risk of fatal coronary where ‖.‖ represents the Euclidean distance. Fitotal (t ) is calculated as:
heart disease which is associated with diabetes in men and women. Fitotal (t ) = ∑ coeff . e−dimPi (t ) − e−dimPp (t )
With their studies, they found that the relative risk for fatal coronary i≠p (5)
heart disease associated with diabetes is 50% higher in women than
that of men. This can be explained by more cardiovascular risk profiles ‘dim’ represents the dimensionality of the problem, ‘coeff’ denotes the
among women with diabetes combined with the possible disciplinary parameter for controlling the growth direction:
treatments that only favor men.
⎧ 1 if Pi (t ) > Pp (t )
coeff = − 1 if Pi (t ) < Pp (t )
⎨
3. Methodology ⎩ 0 otherwise (6)
Moreover, a small probability Pm is included which reflects the
3.1. Artificial plant optimization algorithm (APOA)
influences of some random events:
The Artificial Plant Optimization Algorithm (APOA) is an algorithm x i (t + 1) = xmin + (xmax − x min ). rd1 ( ), if (rd2 ( ) < Pm ) (7)
that is used to handle global optimization problems. It is a bio-inspired
where rd1 ( ) and rd2 ( ) are two random numbers with uniform dis-
algorithm that mimics the growing process of a plant. APO defines the
tribution, respectively. These numbers are generated to incorporate
environment in which a plant can survive. This defined environment
some randomness in the events to mimic the spontaneity in the real
includes some constant resources like air, water and oxygen and some
world. rd1 ( ) and rd2 ( ) can hold any value between 0 and 1 which is the
variable resources like sunlight. Branches of the plant (candidate so-
measure of probability of occurrence of some random event ‘Pm’ as
lutions) are initialized and their fitness value is calculated using op-
discussed above.
erators like photosynthesis, phototropism, apical dominance, and sko-
totropism. Photosynthesis measures the total energy production and
3.1.3. Apical dominance and skototropism operator
phototropism indicates the direction of growth of plant towards sun-
Apical dominance is one of the key phenomena of APO. It states that
light. Apical dominance and skototropism are also quite essential as
the main stem of a plant grows more significantly and dominantly over
they add more refinement in finding a candidate solution moving to-
other side stems; and on a particular branch, the main stem branch is
wards an optimal solution.
more dominant over other side branchlets. Apical dominance defines
two types of buds, namely apical bud and lateral bud and the apical bud
3.1.1. Photosynthesis operator is the one that pays the key role in deciding the growing direction of the
Photosynthesis refers to the energy production of a plant. In pho- branch. The apical bud produces an auxin called IAA (Indole-3-acetic
tosynthesis, the photosynthetic rate is used to give a measure of total acid) that in fact hinder the growth of the lateral buds. If the apical bud
amount of energy produced. Applying a rectangular hyperbolic model, is not present, then the lower IAA concentrations in the plants help the
quality of the produced energy is obtained as: lateral buds to grow and fight for dominance.
According to APOA, the position of apical bud is considered to be
μIfi (t ) Rmax
ri (t ) = − DR the best as it’s light response value (fitness value) is superior over the
μIfi (t ) + Rmax (1) lateral buds. If g (t ) is the best position in the entire population, then it
can be seen as:
where ri (t ) denotes ith branch’s photosynthetic rate at time t, μ is the
efficiency of initial quantum, Rmax is the maximum net photosynthesis g (t ) = arg min { f (x u (t ))| u = 1, . …..,n} (8)
rate, and DR denotes the dark respiratory rate. μ , Rmax and DR control
The apical dominance operator operates on g (t ) as it grows the
the size of photosynthetic rate. They are set as 0.055, 30.2, and 1.44,
fastest and its performance is the best. Suppose a random number is
respectively [11].
defined as rand ( ) then according to the condition, different manner-
Ifi (t ) is the intensity of light and is represented as:
isms are selected.
1
fworst (t ) − fi (t ) Condition 1: If n < rand ( ) < rate , then
Ifi (t ) =
fworst (t ) − fbest (t ) (2) gk (t + 1) = gk (t ) + (gk (t ) − x worst , k (t )). growth. r (9)
where fbest (t ) and fworst (t ) are the best and worst light intensities at time Where
t, respectively. x worst , k (t ) = arg max { f (x u (t ))| u = 1, . …..,n} (10)
fi (t ) is the light intensity of branch ‘i’.
1
Condition 2: If rand ( ) ≤ n
, then
4
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
towards canopy openings, they would be misguided and never get en- 4.2. Libraries used
ough sunlight. Instead they grow towards the darkest places like base
and trunks of large trees. This helps them in finding support and climb • OpenCV
tall trees and, in the end, gets sunlight. Once these vines find a support
structure, the skototropism behavior is turned off and they start ex- OpenCV was begun by Intel in 1999 under Gary Bradsky. It is a
hibiting phototropism. library that is utilized to help a lot of algorithms in the field of
Computer Vision and Machine Learning and it is extending ceaselessly
3.2. Proposed modified artificial plant optimisation algorithm (MAPO) every day. It gives help to a wide cluster of programming languages like
Java, C++, Python and so forth and is good on various platforms in-
The complete flow of the proposed MAPO algorithm is as shown in cluding Windows, OS X, Linux, Android, etc. OpenCV has a Python
Fig. 2. The existing APO algorithm has been modified so that it can be extension called OpenCV-Python which consolidates robustness of
applied to the given problem domain, i.e. to calculate heart rate and to OpenCV C++ API and ease of Python language.
select optimal features for heart disease prediction. Hence two mod-
ifications of existing APOA are proposed: • NumPy
(i) APO (video): To calculate heart rate from noise-free signal NumPy, in general, is an array-processing tool. It provides a mul-
(ii) APO (relational): To select optimal features from a given relational tidimensional array object and tools to work with these arrays.
dataset It is the fundamental package in Python used for scientific com-
puting. It has some prominent features including:
A total of two inputs in the form of datasets were used in the entire
algorithm. A video dataset of fingertips was used to calculate the heart • Provision for linear algebra, Fourier transform, and random number.
rate of a person. A tabular heart disease dataset was used to predict the • A powerful multi-dimensional array object.
probability of having heart disease at present. • Tools for integrating C/C++ and Fortran code.
The fingertip video dataset was initially pre-processed and broken • Sophisticated (broadcasting) functions.
into frames. The graph of mean intensity values vs frames thus obtained • Matplotlib
was filtered to get a smoother graph because of the presence of natural
noise in the former graph. APO (Video) was then applied on the Matplotlib is a visualization library of Python used for plotting 2D
smoothened graph to calculate the heart rate (detailed explanation arrays. It is a library built on NumPy arrays and designed to work with
about modified APO (video) is in section 4.8). The calculated heart rate the broader stack of SciPy. It was introduced in the year 2002 by John
was used as an attribute along with other attributes (Table 3) in the Hunter.
Heart disease dataset to calculate the presence of heart disease using One of the greatest benefits of Matplotlib is that it allows visual
various machine learning models. The machine learning models used access to huge amounts of data. Matplotlib enables several types of
were quite complex and required feature reduction. APO (Relational) plots like line, bar, histogram, scatter, etc.
was thus applied to obtain optimal dataset features and efficiently
predict the probability of having heart disease at present (detailed ex- • Scikit-learn
planation about modified APO (relational) in section 4.10).
Scikit-learn is an open-source library in Python which is used to
4. Implementation implement a wide range of machine learning, preprocessing, cross-va-
lidation and visualization algorithms with the help of a unified inter-
Under this section, the experimental setup, input parameters and face.
implementation of MAPO are discussed. Some significant features of Scikit-learn are:
4.1. Experimental setup • Provisions with simple and efficient tools for data analysis and data
mining. It has various regression, classification and clustering al-
gorithms including SVM, random forests, k-means, etc.
The algorithm has been tested implemented & tested on Google
• Provides universal access and is reusable.
Colaboratory (Colab) (online IDE) that has a Tesla K80 GPU with 2496
• Built using NumPy, matplotlib, and SciPy.
CUDA cores and 12 GB GDDR5 VRAM. The total available RAM was
• Open source and commercially usable.
12.6 GB. Google Colab uses single-core 2.3 GHz hyperthreaded Xeon
Processor and disk storage of 33 GB. It allowed a maximum 12 h of • Keras
5
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
deep learning models so that they can be used for research and devel- 4.5. Dataset
opment. It operates on Python 2.7 or 3.5 and can easily be executed on
GPUs and CPUs. It is released under the MIT license. Keras was de- 4.5.1. Fingertip’s video dataset
veloped using four guiding principles: The Fingertip’s video dataset was a set of fingertip videos with a
time span of 25–35 seconds in which the people were asked to put their
• Minimalism: The library provided is just enough for achieving an index finger over the rear camera of a smartphone with the flash LED
outcome. light switched on. These red illuminated videos were taken in different
• Extensibility: New components are easy to add and to use within light conditions of people aged between 20 and 50 to make the dataset
the framework. more varied and suitable for the model. A sample of several videos was
• Modularity: All the models of deep learning are considered as dis- taken with the following characteristics (Table 2).
crete and can be combined in arbitrary ways.
• Python: No separate model files are present and everything is 4.5.2. Heart disease dataset
programmed in native Python. The Heart Disease Dataset described a person’s medical information
• TensorFlow with attributes like Chest Pain type, resting blood pressure, fasting
glucose and serum cholesterol levels, resting ECG results, maximum
TensorFlow is an open-source library platform for quick numerical heart rate, exercise induced angina results, fluoroscopy results and ST
computation. segment results of ECG (Appendix 2). The dataset also had age and
It was released under the Apache 2.0 an open-source license. It was gender information of the person in consideration. The dataset in
created and is still maintained by Google. The API in TensorFlow is Table 3 was taken from Kaggle (https://www.kaggle.com/ronitf/heart-
usually in the Python language, although there is access to the C++ disease-uci).
API as well.
Unlike other numerical libraries like Theano that are intended for 4.6. Video pre-processing
use in Deep Learning, TensorFlow was designed to be used both in
research and development and in production systems. The videos from the Fingertip’s video dataset required pre-proces-
sing. For it, the videos were initially broken into a sequence of frames.
• SciPy The number of frames for each video was calculated as the product of
frame rate (30 fps) and the duration of video.
SciPy is a scientific Open Source Python-based library that finds its Each extracted frame consisted of three color channels, namely red,
application in mathematics and scientific computing. It is a BSD li- green and blue (RGB).
censed library having modules for image processing, linear algebra, For each of the three channels and a mix channel with the ratio
interpolation, etc. It is a part of the NumPy stack having cross-platform (blue_ratio = 0.23, green_ratio = 0.32, red_ratio = 0.45), mean in-
capabilities. tensity brightness value was calculated using the formula:
It is the most used scientific library because it is easier to understand
n m
and has fast computational power. It contains most of the new data
b (t ) = ∑ ∑ a (i, j, c, t )/(m × n)
science features that are not present in NumPy. j = 1 i=1 (12)
• XGBoost Where,
b (t )= mean brightness for each channel frame
Known as eXtreme Gradient Boosting, it is an advanced open-source m × n= video resolution
programming library that gives gradient boosting framework to dif- a (i , j , c , t )= corresponding matrix for each RGB channel
ferent interfaces like Python, C++, Julia, and Scala. It is an exceed- c= 1, 2, 3 respectively for Red, Green, and Blue
ingly adaptable and flexible device that can take care of both classifi- t = total frames obtained
cation and regression problems. It supports running on dispersed Fig. 3 shows the comparison for the three RGB channels and the mix
frameworks like Apache Hadoop, Apache Spark, and Apache Flink. channel.
It has picked up a great deal of ubiquity over the last couple of years From Fig. 3, it was evident that the red channel provided a much
as the algorithm of choice for many winning teams of machine learning stable graph to extract the heart rate than the other three because of the
competitions. It is an adaptable and proficient implementation of gra- effect of natural noise in them. Thus, the red channel graph is given to
dient boosting machines and it has demonstrated its ability to push the the noise filtering block for further processing.
limits of computing power for boosted trees algorithms. It was ex-
clusively fabricated and produced to improve machine learning model’s 4.7. Noise removal using savitzky-golay filter
performance and computational speed. It was explicitly built to exploit
all the memory and hardware resources for tree boosting algorithms. The red channel graph also possessed a lot of unwanted noise and it
could not be directly used for the calculation of heart rate. So, the red
channel graph was smoothened using filters like the Butterworth Filter.
4.3. Assumptions
All the filters used worked on the concept of taking the Fourier trans-
form of the time domain of the graph and then filter a band of fre-
While recording video for heart rate calculation, the finger is kept
quencies that were unwanted. But since the frequencies were not uni-
stable such that color variations of blood through video are clearly
form throughout the graph (as seen in Fig. 4), this technique did not
visible. It is also assumed that the values of input attributes of the pa-
give promising results.
tient are available for calculation of current heart disease the as de-
Thus, it was required to choose a type of filter that did not involve
scribed in Table 3.
taking Fourier transforms of the time domain and could have smooth-
ened the graph directly. Savitzky-Golay (Savgol) filter was one such
4.4. Input parameters filter and it was used for smoothening of red channel graph.
A Savgol filter is a type of filter that can be applied on digital data
Table 1 shows the input parameters for the implementation of points for smoothing and increasing the precision of a signal without
MAPO. distorting it. The filter fits successive data points within a subset
6
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Table 1
Input Parameters.
Parameter Value Description
Red Ratio 0.45 Ratio of ‘Red’ Component for RGB mix channel
Green Ratio 0.32 Ratio of ‘Green’ Component for RGB mix channel
Blue Ratio 0.23 Ratio of ‘Blue’ Component for RGB mix channel
Window Size (Savgol 1) 9 The length of the filter window (i.e. the number of coefficients) for filter 1
Polynomial Order (Savgol 1) 2 The order of the polynomial used to fit the samples for filter 1
Window Size (Savgol 2) 29 The length of the filter window for filter 2
Polynomial Order (Savgol 2) 4 The order of the polynomial used to fit the samples for filter 2
Frame Rate of Video 30 Number of frames per second of video
Tw 10 Number of points on each side to use for the calculation of fittest candidate solution
Table 2
Fingertip’s Video Dataset Description.
Characteristics Description
1 Age Age at the time of therapeutic inspection 4.8. Proposed modified APO for heart rate calculation
2 Sex (1 = male; 0 = female)
3 cPain Chest Pain Type
Value 1: normal angina To mimic the growing phenomenon of a plant, it is necessary to map
Value 2: atypical angina the algorithm into an optimization problem. Light intensities can be
Value 3: non-anginal torment Value 4: asymptomatic seen as the fitness values that guide the direction of searching the op-
4 restBP resting BP (in mm Hg) timal solution. This is in accordance with the fact that light intensity
5 serumChol serum cholestoral (mg/dl)
6 fstGlucose fasting glucose > 120 mg/dl (1 = true; 0 = false)
guides the direction of plant growth in real life. Moreover, a branch of a
7 rstECG resting electrocardiographic outcomes plant can be viewed as a point. All the particulars can be viewed in the
Value 0: normal Table 4
Value 1: having ST-T wave abnormality (T wave reversals as Because the fitness cost of every branch represents the light in-
well as ST rise or gloom of > 0.05 mV)
tensity, so for smooth application, a standardized range [0,1] is de-
Value 2: certain left ventricular hypertrophy by Estes'
criteria signed to refine this vicinity:
8 maxHR maximum heart rate accomplished
fworst (t ) − f (x u (t ))
9 exAngina exercise incited angina (1 = yes; 0 = no) Score (t , x (t )) =
10 restMelan ST melancholy induced by exercise respect to rest fworst (t ) − fbest (t ) (13)
11 slopeST the slope of the peak exercise ST fragment
Value 1: upsloping where fworst (t ) and fbest (t ) are the worst and best original light intensity
Value 2: flat at time t, respectively, as well as f (x u (t )) denotes the branch u’s ori-
Value 3: downsloping ginal light intensity.
12 Ves number of major vessels (0-3) shaded by fluoroscopy
13 Def Predicted defect result
Value 3 = ordinary 4.8.1. Apical dominance
Value 6 = fixed defect In MAPO, the optimal position may be regarded as one apical bud
Value 7 = reversable defect because its fitness value (light response value) is better than different
14 Target Heart disease forecast
buds. The following formula gives the point of fittest solution (frame
Value 1 = Present
Value 0 = Absent range) for systolic values.
7
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
following formula.
Api (T , t w ) + Api + 1 (T , t w )
FrameGap = mean ⎧ ,∀ i
⎨
⎩ 2
4.8.2. Skototropism
Skototropism is negative phototropism in which plants grow away
from light. Although most of the plant shoots display positive photo-
tropism but plant roots and some vine shoots display skototropism that
lets them grow towards dark solid objects and help climb them. The
following formula gives the point of fittest solutions (frame number) for
diastolic values.
tupper=min(tmax , T + tw )
8
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Fig. 5. Flow Chart of MAPO for heart rate calculation of fingertip video dataset.
9
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
the methods to handle such a scenario. In this statistical approach, the that is used to fit a model and select the strongest feature (or features)
missing and NaN values were replaced with the mean value of the till the performance of the model can't be additionally improved.
corresponding feature. Features are ranked by their fitness values, and by recursively selecting
After pre-processing the dataset, four machine learning algorithms a few features per loop, MAPO tries to eradicate collinearity and de-
were applied to them (using scikit learn library) namely Logistic pendencies that may exist in the model. The algorithm for the relational
Regression, Naïve Bayes, XGB and ANN. APO is written as algorithm 2 & it’s flowchart is as shown in Fig. 6:
For the Heart disease dataset accuracies of the four models is as Algorithm 2: Modified Artificial Plant Optimization Algorithm for
shown in Figs. 9, 10, 11, 12 . Optimal Feature Selection
10
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
After selecting the optimal features using MAPO, the dataset is again
pre-processed like discussed in section 4.9. The four machine learning
algorithms are then again applied on the Heart disease dataset to get
accuracies as shown in Figs. 10,11, 12, 13 . The machine learning
models help in prediction of heart disease by learning from the optimal
attributes obtained by applying APO (relational) on given dataset and
deriving patterns in data that can help in prediction of heart disease.
Also, if a person is diagnosed with heart disease, he can check the
possible causes of heart disease and their corresponding remedies
(Appendix 2).
Function Used: CalculateFitness (list_of_attributes):
The machine learning model starts with the analysis of heart disease
This function calculates the accuracy of the Support Vector Machine
dataset. The learning algorithm produces an inferred function to make
model when trained with ‘list_of_attributes’ parameters.
predictions about the output values which makes the system provide
MAPO needs a certain number of features to be kept, but the
targets for any new input after sufficient training (Appendix 1). The
number of valid features that MAPO needs is not known in advance.
learning algorithm also compares its output with the correct-intended
Thus, to know the exact number of features required, the fitness of each
output to find errors for modifying the model accordingly.
feature (branch) is used to score different feature subsets and get the
best and the most optimal collection of features (branches). The plot
shown in Fig. 7 visualizes how the number of features is selected.
5. Results
This figure shows that the curve suddenly climbs and gives good
accuracy when the seven and eight informative features are taken re-
5.1. Heart rate results
spectively. The accuracy of the graph then decreases gradually as the
number of non-informative features keeps on adding.
Heart rate predicted by MAPO has been compared with three other
models and the results are shown in Table 5 & Fig. 8.
Results show that the RE (relative error) of MAPO is greater than 5%
11
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Fig. 9. Comparison of accuracy, precision and F1 score of Logistic Regression before and after applying APO on heart disease dataset.
Fig. 10. Comparison of accuracy, precision and F1 score of Naïve Bayes before and after applying APO on heart disease dataset.
Fig. 11. Comparison of accuracy, precision and F1 score of XGBoost before and after applying APO on heart disease dataset.
for a lesser number of videos than the other models. 5.2. Comparison of accuracy, precision and F1 score of each machine
MAPO achieves higher accuracies than other related works with the learning model before and after applying APO on heart disease dataset
Pearson Correlation (r) and the Standard Error of Estimate (SEE) as
0.9541 and 2.418 respectively (Table 6). 5.2.1. Optimal features
MAPO was effectively able to select optimal features in the Heart
Disease dataset. Originally the dataset had 13 features. MAPO selected
7 features from 13 features which were significant for heart disease
12
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
Fig. 12. Comparison of accuracy, precision and F1 score of ANN before and after applying APO on heart disease dataset.
13
P. Sharma, et al. Artificial Intelligence In Medicine 102 (2020) 101752
disease. Results show that even by 46% reduction in number of input [11] Zhihua Cui and Xingjuan Cai. Artificial Plant Optimization Algorithm. Swarm
parameters, there is no substantial decrease in performance of our Intelligence and Bio-Inspired Computation, © 2013 Elsevier Inc.
[12] Cui Z, Liu X, Liu D, Zeng J, Shi Z. Using gravitropism artificial plant optimization
model which proves that selected features are optimal features. MAPO algorithm to solve toy model of protein folding. Comput Theor Nanosci
also outperforms and shows better accuracy as compared with other 2013;10:1540–4.
related optimal feature selectors. [13] Cui Z, Liu W, Dai C, Chen W. Artificial plant optimization algorithm with dynamic
local search for optimal coverage configuration. American Scientific Publishers
MAPO can further be extended to make a fully functional mobile 2014;12(January (1)):118–22. (5).
application that can be used directly by people without much assis- [14] Vijayashree T, Gopal A. Authentication of leaf image using image processing
tance. Moreover, the assumption of the finger being stable when the technique. J Eng Appl Sci 2015;10(9).
[15] Cui Z, Liu D, Zeng J, Shi Z. Using splitting artificial plant optimization algorithm to
video is recorded can be eliminated and an algorithm could be designed solve toy model of protein folding. J Comput Theor Nanosci 2012;9:2255–9.
to predict the heart rate even if the finger is moving. MAPO could also [16] Liu D, Cui Z. Protein folding structure prediction with artificial plant optimization
find its application to not only detect the heart rate but also other algorithm based golden section and limited memory
Broyden–Fletcher–Goldfarb–Shanno. J Bionanoscience 2013;7:114–20.
measures like hemoglobin levels, sugar levels, etc.
[17] Pelegris P, Banitsas K, Orbach T, Marias K. A simple algorithm to monitor HR for
MAPO can also be extended such that a regular change in the step Real time treatment applications. Proceedings of the 9th International Conference
parameter to remove more than one feature at each step helps in se- on Information Technology and Applications in Biomedine 2009:151–6.
lecting the best features in the starting itself (can also be used to speed [18] Jonathan E, Leahy M. Cellular phone based photoplethysmographic imaging. J
Biophotonics 2011;4(5):293–6.
up the process of feature selection if the datasets are large). [19] Acharya D, Rani A, Agarwal S, Singh V. Application of adaptive Savitzky–Golay
filter for EEG signal processing. Perspect Sci 2016;8:677–9.
Declaration of competing interest [20] Elgendi M. On the analysis of fingertip photoplethysmogram signals. Curr Cardiol
Rev 2012;8(1):14–25.
[21] Rodondi N, Locatelli I, Aujesky D, Butler J, Vittinghoff E, Simonsick E. Framingham
The authors declare that they have no known competing financial risk score and alternatives for prediction of coronary heart disease in older adults.
interests or personal relationships that could have appeared to influ- PLoS One 2012;7(3):e34287.
[22] Sharma N, Saroha K. Study of dimension reduction methodologies in data mining.
ence the work reported in this paper. International Conference on Computing, Communication & Automation 2015.
[23] Elhariri E, El-Bendary N, Hassanien AE. Plant classification system based on leaf
Appendix A. Supplementary data features. 2014 9th International Conference on Computer Engineering & Systems
(ICCES) 2014.
[24] Tsai PW, Pan JS, Liao BY, Tsai MJ, Istanda V. Bat algorithm inspired algorithm for
Supplementary material related to this article can be found, in the solving numerical optimization problems. Appl Mech Mater 2011;148–149:134–7.
online version, at doi:https://doi.org/10.1016/j.artmed.2019.101752. [25] Tiwari P, Melucci M. Towards a quantum-inspired framework for binary classifi-
cation CIKM Proceedings of the 27th ACM International Conference on Information
and Knowledge Management2018:1815–8.
References [26] Tiwari P, Melucci M. Towards a quantum-inspired binary classifier. IEEE Access
2019;7:42354–72.
[1] Yang X-S, Gandomi AH. Bat algorithm: a novel approach for global engineering [27] Tiwari P, Melucci M. Multi-class classification model inspired by quantum detection
optimization. Eng Comput 2012;29(5):464–83. theory. ARXIV 2018. Future Directions in Information Access (FDIA).
[2] Fister Jr I, Fister D, Yang X-S. A hybrid bat algorithm. Neural Evol Comput [28] Tiwari P, Melucci M. Binary classifier inspired by quantum theory. ARXIV 2019.
2013;80:1–7. AAAI.
[3] Zhao Z, Cui Z, Zeng J. Artificial plant optimization algorithm for constrained op- [29] Di Buccio E, Li Q, Melucci M, Tiwari P. Binary classification model inspired from
timization problems. Second International Conference on Innovations in Bio- quantum detection theory. Proceedings of the 2018 ACM SIGIR International
Inspired Computing and Applications 2011. Conference on Theory of Information 2018.
[4] Bing Y, Cui Z, Zhang G. Artificial Plant Optimization Algorithm with Correlation [30] Roy S, Mccrory J. Validation of maximal heart rate prediction equations based on
Branches. J. Bioinf. Intell. Control 2013;2:146–55. sex and physical activity status. Int J Exerc Sci 2015;2015:318–30.
[5] Cai W, Yang W, Chen X. A global optimization algorithm based on plant growth [31] Gupta A, Lampropulos JF, Bikdeli B, Mody P, Chen R, Kulkarni VT. Most important
theory: plant growth optimization. International Conference on Intelligent outcomes research papers on cardiovascular disease in women. Cardiovasc Qual
Computation Technology and Automation 2008. Outcomes 2013;6(1):e1–7.
[6] Cui Z, Liu D, Zeng J, Shi Z. Using splitting artificial plant optimization algorithm to [32] Ibrahim D, Buruncuk K. Heart rate measurement from the finger using a low-cost
solve toy model of protein folding. J Comput Theor Nanosci 2012;9(December microcontroller. 2007.
(12)):2255–9. (5). [33] Huxley R, Barzi F, Woodward M. “Excess risk of fatal coronary heart disease as-
[7] Rewar E, Singh BP, Chhipa MK, Sharma OP, Kumari M. Detection of infected and sociated with diabetes in men and women”. Meta-analysis of 37 prospective cohort
healthy part of leaf using image processing techniques. J Adv Res Dynamical studies. BMJ 2005:73–8.
Control Syst 2017;9(1). [34] Dantu R. Measuring vital signs using smart phones, doctor‘s degree of University of
[8] Gupta D, Julka A, Jain S, Aggarwal T, Khanna A, Arunkumar N, et al. “Optimized North Texas. 2010.
cuttlefish algorithm for diagnosis of Parkinson’s disease” Elsevier. Cogn Syst Res [35] Pelegris P, Banitsas K, Orbach T, Marias K. A novel method to detect heart beat rate
2018;52:36–48. using a mobile phone. Proceedings of 32nd IEEE Annual International Conference
[9] Y. Shi, R.C. Eberhart. Empirical study of particle swarm optimization. IEEE on Merging Medical Humanism and Technology 2010:5488–91.
Proceedings of the 1999 Congress on Evolutionary Computation-CEC99. [36] Laure D. Heart rate measuring using mobile phone’s camera. Proceedings of the
[10] Kar AK. Bio inspired computing – a review of algorithms and scope of applications. 12th Conference of Open Innovations Association 2012;12:272–3.
Expert Syst Appl 2016.
14