0% found this document useful (0 votes)
16 views64 pages

DocScanner 14-Mar-2025 11-59

The project report focuses on predicting heart disease using machine learning techniques, specifically employing the Random Forest algorithm for its high accuracy. It emphasizes the importance of early diagnosis in reducing mortality rates associated with cardiovascular diseases and discusses the use of large clinical datasets for model training. The report includes various sections such as literature survey, system design, implementation, and testing methodologies.

Uploaded by

Hari Priyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views64 pages

DocScanner 14-Mar-2025 11-59

The project report focuses on predicting heart disease using machine learning techniques, specifically employing the Random Forest algorithm for its high accuracy. It emphasizes the importance of early diagnosis in reducing mortality rates associated with cardiovascular diseases and discusses the use of large clinical datasets for model training. The report includes various sections such as literature survey, system design, implementation, and testing methodologies.

Uploaded by

Hari Priyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

A Project Report on

HEART DISEASE PREDICTION USING RIACHINE LEARNING

Submitted in partial fulfill ment of the i equirements fol the award of the ilegiee

BACHELOR OF TECHNOLOGY
IN
CO6IPUTER SCIENCE AND ENGINEERING

E .Sri Lalitha (ltiNNlA0564)


Devi
(16NN1A0558)
B. Kosi Annapurna
(I6NNIA0584)
M. Naga Bliavaiii
(I6NNIA0592)
P. Sai Vanaja

Under the esteeiiietl s• ^aiice of


R. Veiihatesli
Assistant Professor

DEPARTMENT or cosiPUTrn scirNCE AND ENGINEERING


VIGNAN'S NIRULA INSTITUTE or TECHxoiocr AND SCIENCE
FOR \1'OSIEN

(APPROVED HY AICTE, .ti riLia FrDTO JN3“UK, KAKINADA)

PALAKALURU ROAD, GUNTUR-5221109


2016 — 2020
V1GNAN'S NIRULA INSTITUTE OF TECHNOLOGY AND SCIENCE
FOR 1YO3IEN
(APPROVED DC’ AICTE, AFFIT.I ATED TO JNTUK, KA KJNADA)

PALAKALURU ROAD, GUNTUR-522009

DEPARTMENT OF COLJPUTER SCIENCE AND ENGINEERING

This is to ccrtify that the project work cntitlcd ”HEART DISEASE PREDICTION
USING MACHINE LEARNING" being submitted by E.Sri Lalitliadcvi ( 16NN I
A0564), B.Kasi Aiiiiaptima (16NN1 A055S), M.Naga Bliavani ( I6NN I A05S4) and
P.Sai Vaiiaja (l6NNl A0592) iii thc partial fulfillment for tlic award of tlcgrcc of
Bachelor of Technology in Coinptttcr Science and Enginccring in Vignan's Nirula Institute
or Technology and Science For \Vonicn and this bonafide work carried out by them.

Internal Guide Head of the Department


Air.R.Venkatesh

External Examiner
We hereby declare thnt the work described in this project work. entitled “HEART
oisrAsn PREDICTION USING MACHINE LEARNING” which is submitted by us in
partial fulfilment for the award of Bachelor of Technology in tlic Department of
Computer Science and Engineering to the Vignan's Nirula Institute of Technology and
Science for \Voirien, affiliated to Jawaharlal Nehru Technological University Kakinada,
Andhra Pmdesh, is the result of work done by us under the guidance of Mr. R.VenkntesIi,
Assistant Professor.

The work is original and has not been submitted for any Degree of this or any other university.

E. Sri Lalitha Devi (16NN1A0564


B. ICasi Annupurna
) ( 16NN1A055
S)
P. Sai Vanaja
(16NN1A0584)
(l6NN1A0592)
\Vc hcrcby declare that the WOfi $Cücfibcd in this project work, cntilled “HEART DISEASE
PREDICTION USING MACHINE I.EARNING" which is siibinittcd by us in partial
fiilfiliiiciit for the award of Bachelor of Technology in the Department of Computer Science
and Engineering to the Vignnn's Nirula Institute of Technology and Science for
Women, affilialcd to Jawaharlal Neliru Tcclinolopical University Kakinada, Andlim Pradcsli, is
the result of work done by us under the guiöa er Ma. It.Venkatesh, Assistant Profcssor.

The work is original and has not bccn stibmitted for any Degree of this or any other university.

Pince : Giintur E. Sri Lalitha Devi (I6NN1A0564)


Date: B. Kasi Annapuma (I6NN1A0558)
M. Naga Bhavani (16NN1A05S4
P. Sai Vanaja )
(I6NN1A0592)
ACKNOWLEDGEMENT

\Ve wish to avail this opportunity to thank Dr . P. Radhikn, Principal Vignan’ s


Nirula Institute of Technology & Science for women for her continuous support and
giving valuable suggestions during the entire period of the project work.

We take this opportunity to express my gratitude to Dr . B. Thulasi, Head of


the Department of Computer Science and Engineering for her great help and
encouragementto this project work.

We would like to express our unbounded gratefulness to our guide Mr.R.Venkatesh,


Assistant professor, Department of Computer Science and Engineering, for his
valuable support and motivation at each and every point in successful completion of the
project.

We also thank all the faculty of the department of Computer Science and
Engineering for their help and guidance on numerous occasions, which has given us the
cogency to build- up adamant aspiration over the completion of our project thesis.

Finally we thank one and all who directly or indirectly helped us to complete our project
thesis successfully.

E. Sri Lalitha Devi (16NN1A0564)


B. Kasi Annapuma (l6NN1A055S)
M. Naga Bhavani (l6NN1A05S4)
P. Sai Vanaja (16NNIA0592)
ABSTRACT

In ths medical field, the diagnosis of heart disease is the most difficult task. Tire diagnosis
of hcart disease is difficult as a decision mlicd on grouping of large clinical and
pathological data. Due to this complication, the interest increased in a significant amount
benveen the mscarcliers and clinical professionals abotit the cfficicnt and accur;ite
heart disease prediction. In case of heart disease, the correct diagnosis in early state is
important as time is tlic very important factor. Heart disease is the principal source of
deaths widespread, and the prediction of Hcari Disease is significant at an untimely phase.
Machine learning in recent years!iaS bCen till CVOl¥'ing, reliable and supporting tools in
medical doirtain and has provided the greatest support for prcdicting disease with correct
case of training and testing. The itiain idea behind ihis work is to study diverse prediction
models for the heart disease and selecting important heart disease fcatiirc using Random
Forests algorithm. Random Forests is the Supervised Machine Learning algorithm
which has the high accuracy compared to other Supcrvised Machine Learning
algorithms such as logistic recession etc. By using Random Forests algorithm we arc
going to predict if a person has heart disease or not.
TABLE OF CONTENTS

Chapter 1 Introduction 1
Chapter 2 Literature Survey 3
Chapter 3 System Analysis 5
3.1 Existing System 5
3.2 Proposed System 5
3.3 Algorithms 6
3.4 Feasibility Study S
3.5 Effort,Duration,andCost Estimation using Cocomo Model 9
Chapter 4 Software Requirements Specification 14
4.1 Introduction To Requirement Specification 14
4.2 Requirement Analysis 14
4.3System Requirements 17
4.4Sofhvare Description 17
Chapter S System Design 20
5.1 System Architecture 20
5.2 Modules 20
5.3 Data Flow Diagram 21
5.4 UML Diagram 24
5.4.1 Use Case Diagram 24
5.4.2 Activity Diagram 25
5.4.3 Sequence Diagram 25
5.4.4 Class Diagram 26
Chapter 6 Implementation 27
6.1 Steps for Implementation 27
6.2 Coding 27
Chapter 7 S3stem Testing 29
7.1 White Box Testing 29
7.2 Black Box Testing 33
ChRpter 8 Screen shots 36
5.1 Anaconda Prompt 36
8.2 Home Screen for Heart Attack Prediction 36
8.3 Patient Details 37
5.4 Output for Particular Patient Details 37
Chapter 9 Conclusion 35
Chaptes 10 Future Scope 39
Cliapter 11 Refereiices 40
List of Diamants
3.1 Logistic Regression 7
3.2 Random Forest
4.1 Jupyter Notebook 19
5.1System Architecture 20
5.2 Data Flow diagram Level 0 22
5.3 Data Flow Diagram Level l 23
5.4 Use Case Diagram 24
5.5 Activity Diagram 25
5.6 Sequence Diagram 26
List of Tables
2.1 LiSt of attributes 4
3.1 Organic, Semidetached and embedded system values 10
3.2 Project Architecture 12
CHAPTER 1

TRODUCVON
INTRODUCTION

The heart is a kind of muscular organ which pumps blood into the body and is
the central part of ihe body's cardiovascular system which also contains lungs.
Cardiovascular system also comprises a network of blood vcsscls. for example,
veins, arteries, and capillaries. These blood vessels deliver b!ood all over ltte body.
Abnonnalities in normal blood flOw' from the heart Cause several types of heart diseases
which are commonly known as cardiovascular diseases (CVD). Heart diseases arc the
main reasons for death worldwide. According to the survey of the \Vor1d Health
Organization (\VHO), 17.5 million total global deaths occur because of hearl attacks
and strokes. More ihan 75% of deaths from cardiovascular diseases occur mostly in
middle-income and low-income countries. Also, 80% of the deaths that occur due to
CVDs arc because of stroke and heart attack . Therefore, prediction of cardiac
abnormalities at the early stage and tools for the prediction of heart diseases can
save a lot of life and help doctors to design an effective treatment plan which
ultimately reduces the mortality rate due to cardiovascular diseases.

Due to the dex•e!opment of advance hcaltlicam systems, lots of patient data


are nowadays available (i.e. Big Data in Electronic Hcaltli Record System) which can
be used for designing predictive models for Cardiovascular diseases. Data mining
or machine learning is a discovery method for analyzing bid data from an assorted
perspective and encapsulatins it into useful information. “Data Mining is a non-trivial
extraction of implicit,
previously unknown and potentially useful i«rormaiion about data". Nowadays, a huce
amount of data pertaining to disease diagnosis, patients etc. arc gcnemted by
healthcare industries. Data mining provides a number of techniques which discovcr
hidden patterns or similarities from data.

Therefore, in this paper, a machine learning algorithm is proposed for the


implcincntation ofa heart disease prediction systems which was validated on two open
access heart disease prediction datasets. Data mining is the computer bascd process of
extracting useful infonnation from enormous sets of databases. Data iiiining is most
lielpfiil in an cxplorati›'c analysis because of nontrivial infonnation from large volumes of
ex•idence
.Medical data mining has great potential for exploring the cryptic patterns in the data sets
of the clinical domain.

These pallcriis can be utilized for healthcare diagnosis. However, the available
raw medical daia arc widely distributed, voluminous and heterogeneous in nature .This data
needs to be collected in an organized form. This collected data can be then
intcgmted to form a
medical information system. Data mining provides a user-oriented approach to novel
and hidden patterns in the Data The data mining tools arc useful for answering business
questions and techniques for predicting the various diseases in the healthcare field.
Disease prediction plays a significant role in data mining. This paper analyzes the heart
disease predictions using classification algorithms. These invisible patterns can be
utilized for health diagnosis in licaltheam data.

Data mining technology affords an efficient approach to the latest and indcfinitc
patterns in the data. The information which is identified can be used by the healthcare
adininisimiors to get better services. Heart disease was the most cnicial reason for victims
in the countries like India, Unitcd States. In this project we arc predicting the heart
disease using classification algorithms. Machine learning techniques like Classification
algorithms such as Random forest, Logistic Rcgrcssion are uscd to explore different
kinds of hcart based problems.
CHAPTER 2

LITERATURE SURVEY
LITERATURE SURVEY

Machine Learning techniques arc used to analyze and predict the medical
data information resources. Diagnosis oflieari disease is a significant and tedious task in
medicine. The term Heart disease encompasses the various discases that affect the heart.
The e;tposure ofheart disease from vnrious factors or symptom is an issve which is not
coinpliinenary from false presumptions often accompanied by unpredictable effects. The
data classification is based on Supervised Machine Learning algorithm which rcsiilts in
better accumcy. Here we arc using the Random Forest as the training algorithm to train the
heart disease dataset and to predict the heart disease. The results showed that the
medicinal prescription and designed prediction system is capable of prophesying the heart
attack successfully. Machine Learning techniques am used to indicate the early mortality
by analyzing the heart disease patients and their clinical records (Richards, G. ct al.,
2001). (Sung, S.F. ct al., 2015) have brought about the hvo Machine Learning techniques,
k-nearest neighbor model and existing multi linear regression to predict the stroke
severity index (SSI) of the patients. Their study show that k- nearest neighbor pcrfonned
better than Multi Linear Regression ittodel. (Arslan, A. K. et at., 2016) have suggested
various Machine Learning techniques such as support vector machine (SVM), penalized
logistic regression (PLR) to predict the heart stroke. Their results show that SVM
produced the best performance in prediction when compared to
other models.Boshra Brahmi ct aI, (20] developed different Machine Learning
techniques to evaluate the prediction and diagnosis of heart disease. The main objective
is to evaluate the different classification techniques such as J4S, Decision Tree, KNN and
Naive Bayes. After this, evaluating some performance in measures of accuracy, precision,
sensitivity, specificity are evaluated .

Data source
Clinical databases have collected a significant amount of information about patients
and their medical conditions. Records set with medical attributes were obtained
from the Cleveland Heart Disease database. With the help of file dataset, the patterns
significant to the heart attack diagnosis are extnictcd. The records were split equally into
two datasets: tmininp dataset and testing dataset. A total of 303 records wiih 76 medical
attributes xv•eze obtained. All the attributes are numeric-valued. We are working on a
reduced set of attributes, i.e. only 14 attributes.
All these restrictions were announced to shrink the digit of designs, these are as follows:
1) The feattlms should seem on a single side of the rule.
3
2) The rule should distinct various features into the different groups.

4
3) The count of features available from the mle is organized by medical history of people
having heart disease only.

The following table shows the list of attributes on which we are working.

S no Attribute Name Description


1 Age age in years
2 Sex (1 = malc; 0 = female)
3 Cp Chest Pain
4 Trestbps resting blood pressure (in min Hg on admission to the hospital)
5 Chol serum cholesterol in mg/dI
6 Fbs (fasting blood sugar > 120 ing/dl) (1 = tfuc; 0 = false)
7 Rcsiccg resting electrocardiographic results
S Thalacli maximum heart rate achieved
9 Exam exercise induced angina (1 = yes; 0 = no)
10 Oldpcak ST depression induced by exercise relative to mst
11 Slope the slope of the pcak exercise ST segment
12 Ca ninriber of major vcssels (0-3) colored by flourosopy
13 Thal 3 - nonnal; 6 = fixed defect; 7 = reversible defect
14 Target 1 or 0

Table 2.1: List of nttribptes

5
CHAPTER 3

SYSTEM ANAYLSIS
SYSTEM ANAYLSIS

3.1EXISTINc SvsTrM
Clinical decisions arc often made based on doctors' intuition and experience
father than on the knowledge rich data hidden in the database. This practice leads to
unwanted biases, errors and excessive iiicdical costs which affects lhe quality of
scrvice provided to patients. There are many ways that a medical misdiagnosis can
present itself. \Vltetlier a doctor is at fault, or hospital staff, a misdiagnosis of a serious
illness can have very extreme and hannfiil effects. The National Patient Safety
Foundation cites that 42% of medical patients feel they have had experienced a
lncdical error or missed diagnosis. Patient safcty is sometimes negligently }iivcn the back
scat for other concerns, such as the cost of medical tests, drugs, and operations.
Medical Misdiagnoscs am a serious fisk to our healthcare profession. If they continue,
then people will fear going to the hospital for treatment. \Ve can put an end to medical
inisdiagnosis by informing the public and filing claims and suils against the medical
practitioners at fault.
Dlsadv.outages:
• Prediction is not possible at carly stages.
• ln the Existing system, practical use of collected data is time consuming.
• Any faults occurred by the doctor or hospital staff n predicting would lead to fatnl
incidents.
• Highly expensive and laborious process needs to be performed before treating the
patient to find out if lie/she has any chances to get heart disease in future.

3.2PROrosro svSTEM

This section depicts the overview of i)ie proposed system arid illustratcs all of
the componcnts, techniques and tools are used for dcveloping the entire system. To
develop an inte 's••t and user-friendly heart disease prediction system, an efficient
software tool is needed in order to train huge datasets and compare miilliplc machine
learning algorithms.
After choosing the robust algorithm with best accuracy and performance mcasiires, it will
be implemented on the development of tllc smart phone-based application for
dctccting and pmdicting heart discasc risk level, Hardware components like
ArdUino/Raspberry Pi, different biomedical sensors, display monitor, buzzer etc. arc
needed to build the continuous patient monitoring system.
3.3 ALGORITHMS

3.3.1 Logistic Regression

A popular statistical technique to predict binomial outcomes (y = 0 or 1) is Logistic


Regression. Logistic regression pmdicts categorical outcomes (binomial / multinomial
values ofy). The predictions of Logistic Regression (hcnccforth, LogR in this article) are in
the form of probabilities of an event occurring, i.c. the probability of y=l, given ccrtnin
values of input variables x. Thus, the results of LogR range bchveen 0-1.

LogR models lhe data points using the standard logistic function, which is an S-
shaped curve also called as sigmoid curve and is given by the equation:

Logistic Regression Assumptions:

• Logistic regression requires the dependent variable to be binnry.

• For a binary regression, the factor level 1 of the dependent variable should represent the
desired outcome.

• Only the meaningful variables should be included.

• The independent variables should be independent of cach others

• Logistic regression requires quite large sample sizes.


• Even though, logistic (loglt) regression is frequently used for binary variables (2 classes), it
can be used for categorical dependent variables with more than 2 classes.

• In this case it's called Miiliinomial Logistic Regression.


D.
7

•Truesamyles

R I’
l

O 02 OA O{ O8

Fig 3.1: logistic regression

3.3.2Random Forest
n ndom forest is a supervised learning algorithm which is used for both classification as well
as regression .But however ,it is mainly used for classification problems .As we know that
a forest is made up of trees and more trees means more robust forest .
Similarly ,mndom forest creates decision trees on data samples and then gets the
prediction froin each of them and finally selects the best solution by means of voting .It
is ensemble method which is better than a single decision tree because it reduces ihc over-
fitting by averaging the result .

7
Working of Random Forest with the help of following steps:

• First ,siart with the selection of random samples from a given dataset.

• Nexi ,ihis algorithm will construct a decision tree for every Sample .Then it will get
the prediction mstilt from every decision tree .

• In this step, voting will be performed for every predicted result.

• At last ,select the most voted prediction results as the final

predictionresult. The following diagram will illustrates its working-

Random Forest Simpllfied

1xc-2 Tr-z

Class-B uass-B
Mijotity-Voting

Fig 3.2: Random Forest

3.4 FEASIIIILITY STUDY


A Feasibility Study is a preliminary study xindertaken befote the teal soik of a
project starts to ascertain the likely hood of ihe projects success. It is an analysis
of possible alternative solutions to a problem and a recommendation on the best
alternative.

3.4.1 Fconomlc Feasibillty:


It is defined as the process of assessing the benefits and costs associated with
the deveiopment of project. A proposed system, which is both operationally and
technically
8
feasible, must be a good investment for the organizntion. With the proposed system the
uscrs arc grcatly bcnefitcd as the uscrs can be able to dctcct the fake news from the mal
news and i1F0 itWafe of most real and most fake news piiblishcd in the mcent ycars.
This proposed system docs not need any additional software and high system
configuration. Hcnce the proposed system is econoinically feasible.
3.4.2Teclinlcal Fenslblllty:
The technical feasibility infers whethcr the proposed system can bc
dcveloped considering the iechnical issues like availability of the nccessary
technology, techniCal eapacity, adequate responsc and cxtensibility. The project is
decidod to build using Python. Jupiter Note Book is designed for tise in distributcd
cnvironment of the internet and for the professional prograInmer it is easy to leafli
ilnd use effectively. As the devcloping organisation Iras all the rcsourccs rivailable to
build the system themfore the proposcd system is teehnically feasible.
3.4.à Operntlonal Feaslblllty:
Operatioilal feasibility is defined as the process of asscssing the dcgree to
whicli a proposed system solves business probleme or takcs advaniagc of business
opportunitics. The system is self-explanatory and doesn't need any extra sophisticated
training. The system Iras btiilt-in methods and classes which am required th produce the
result. The application can be handled very sasily with a novice user. The ovcrall time that
a user needs to get trained is 14 lcss tlian one liour. As the software that is used for
developing this application is very econoiiiical and is readily available in the
market. Tlierefore the proposcd system is operationally feasiblc.

3.5 EFFORT, DURATION AND COST ESTIMATION USING COCOMO MODEL


The Cocoino (Consinictive Cost Model) model is the most complete and
thoroughly documented model used in effort estimation. The model provides
detailed formulas for determining the development time schedule, overall
development effort, and effort breakdown by phase and activity as well as
maintenance cffon.

COCOMO estimates the effort in person months of dimct labor. The primary
effort factor is the number of source lines of code (SLOC) expressed in thousands of
delivered SOUrCe iflsirlictions (KDSI).The model is developed in three versions of
different level of detail basic, intermediate, and detailed. The overall modeling process
takes into account three classes of systems.

9
1. Embedded: This class of system is characterized by tight consirnints,
changing environment, and unfamiliar surroundings. Projects of the embedded type am
model to the company and usually exhibit temporal constraints.

1 Organic: This category encompasses all systems thai are small relative to project size
and team size, and have a stable environment, familiar surroundings and relaxed
interfaces. These are simple business systems, data processing systems, and small
sofhvare libraries.

.1Semldetaclied: The software systems falling under this category are a mix of those
of organic and embedded in nattim.

Some examples of solvare of iliis class are operating systems, database management
system, rind inventory management systems.

For basic COCOMO Effort = a*(KLOC) b

Type w*(effort)d

For Intermediate and Detailed COCOMO Effort = a * (KLOC) b* EAF (EAF = product of
cost drivers)
Type of Product A B
Organic 2.4 1.02 2.5 0.39

Semi Detached 3.0 I.}2 2.S 0.35

Embedded 3.6 1.20 2.5 0.32

Table 3.1: Organic, Semidetached and Embedded system values

Intermediate COCOMO inodvl is a refinement of the basic model, which comes in the
function of 15 attributcs of the product. For each of the attributes the user of the model
has to provide a mting using the following six point scale.

VL(Very Low)
HIHigh
LO (Lori')
VW(VcyHigh)
NM(Nominal)
XH (ExiraHigh)
The list of attributes is composed of several features of the software and includes product,
computer, personal and project attributes as follows.

1
0
ñ.5.1 Product Attri\›utes

• Required relinliility IRELY): It is used to express an effect ofsoft»'are faults


mutiny from slight inconvenience (VL) to loss of life (VH). The nominal value
(NM) denotes moderate recoverable losses.

• Data bytes per DSI (DATA): Thc lower rating comes with lower size of a database.
Complexity (CPLX): The aitribiitC expresses code complexity again ranging from
straight batch code (VG) to real time code with multiple resources scheduling (XH)

3.5.2 Computer Attributes

• Execution tlme fTIMfi) and memory (STOR) constraints: This attributcidcntifies


the perccntagc of computer resources used by the sysicin. NM states that less than
50% is used; 95"Z» is indicated by XH.]

• Virtual machine volatility (VIRT): It is used in indicate the frequency of changes


trade tc tltc hardware, operating systcm, and overall soWvare environment. More
frequent and significant changes arc indicatcd by higher ratings.

• Development turnaround time (TURN): This is a time from ivhcn a job is submitted
until output becomes received. LO indicated a highly interactive environment, VH
quantifies a situation vhen lhis time is longer than 12 hours.

3.5.3 Personal Attributes:

• Analyst capability(ACAP) and programmer capability (PCAP):

• This describes skills of the developing team. The higher the skills, the higher the
ruting.

• Application cxpcricnce (AEXP), language experience (LEXP), and virtual machine


experience (VEXP):

• These are used to quantify tlic number of experience in cach area by the
development team; morc e.xpcrience, hillier rating.

3.5.4 Project Attributes:

• JYlodern development practices (AIODP): deals with the amount of use of modern
sofhx'are practices such as structural ptogratniviinp artd objcct oriented apptoack.
1
1
• Use of software tools (TOOL): is used to measure a level of sophistication of
automated tools used in software development and a degree of integration among
the tools being used. Higher rating describes levels in both aspects.

• Schedule effects (SCED): concerns the amount of schedule compression (HI or


VH), or schedule expansion (LO or VL) of the development schedule in comparison to
a nominal (NM) schedule

LO HI VH XH

RELY 0.75 0.55 1.00 1.15 1.40

DATA 0.94 1.00 1.05 t. 16

CPLX 0.70 0.55 1.00 1.15 1.30 1.65

TIME 1.00 1.11 1.30 1.66


STOR 1.00 1.06 1.21 1.56

VIRT 0.87 1.00 1.15 1.30

TURN 0.57 1.00 1.15 1.30

ACAP 1.46 1.19 1.00 096 0.71

AEXP L29 1.13 1.00 091 0.52

PCAP 1.42 1.17 1.00 096 0.70

LEXP 1.14 1.07 1.00 0.95

VEXP 1.21 1.10 1.00 0.f0

MODP 1.24 1.10 1.00 0.91 0.82

TOOL 1.24 t. 10 1.00 0.91 0.53

SCED 1.23 1.05 1.00 1.04 1.10

'Table 3.2:Project Attributes

12
Our project is an organic system and for intermediate
COCOMO Effort = a * (KLOC) b *EAF

KLOC = 115

For organic system


a = 2.4

b = 1.02

E,W = product of cost


Driver's effoiU2.4*(0.1 l5)^ L02*1.30

= 1.034

Programmer months Time for development = C * (Effort) d

= 2.5 * ( 1.034)^0.3S

= 2.71 months
Cost of programmer = Effort * cost of programmer per month
= 1.034 * 20000

= 20650

Project cost = 20000 +20550

= 40650

13
CHAPTER 4

SOP“TYY'ARE REQtJIRERIENTS SPEC/IFI€'ATION


SOFTWARE REQUIREMENTS SPECIFICATION

4.1 INTRODUCTION TO REQUIREMENT SPECIFICATION


Software Engineering by James F Peters & WitoldPedrycz Head First Java by Ka.
A Software Requirements Specification (SRS) is a description of particular software
product, program or set of programs that performs a set of functions in a target
environment (IEEE Std. 830-1993).
a. Purpose
The purpose of software requirements specification specifies the intentions
and intended audience of the SRS.
b. Scope
The scope of the SRS identifies the software product to be produced, the
capabilities, application, relevant objects etc. We are proposed to implement
Passive Aggressive Algorithm which takes the test and trained data set from the
c. Definitions, Acronyms and Abbreviations Software Requirements Specification
It's a description of a particular software product, program or set of progmms
that performs a set of function in target environment.
d. References
IEEE Std. 530-1993, IEEE Recommended Practice for Software
Requirements Specifications thy Sierra and Bert Bates.
e. Overview
The SRS contains the details of process, DFD's, functions of the product,
user characteristics. The non-functional requirements if any are also specified.
f. Overall description
The main functions associated with the product are described in this section of
SRS. The characteristics of a user of this product are indicated. The assumptions in this
section result from interaction with the project stakeholders.

4.2 REQUIREMENT ANALYSIS


Software Requirement Specification (SRS) is the starting point of the software
developing activity. As system grew more complex it became evident that the goal of the
entire system cannot be easily comprehended. Hence the need for the requirement phase
arose. The software project is initiated by the client needs. The SRS is the means of translating
the ideas of the minds of clients (the input) into a formal document (the output of the
requirement phase.) Under requirement specification, the focus is on specifying what has
been found gi*ing analysis such as representation, specification languages and tools,
and checking the specificaöons are addressed during this activity. The Requirement
phase terminates with the production of the validate SRS document.
Producing the SRS document is the basic .goal of this phase. The purpose of the
Software Requirement Specification is to reduce the communication gap between
theclients and the developers. Software Requiremcni Specification is the medium
tbough which the client and user needs are.accurately specified. lt forms the basis of
software development. A good SRS should satisfy all the parties involved in the system.
A2.1 Product Perspective:
The application is developed in such a way that any future enhancement can be
easily implementable. The project is developed in such a way that it requises minimal
maintenance. The software used are open source and easy to install. Tàe application
developed should be easy to install and use. This is an independent application which can
be easily run on to any system which has Python installed and Supiter Notebook.
4.2.2 Product Features•
The application is developed in a way that ‘Heart disease’ accuracy is predicted
using Random Forest. The dataset is taken from
. We can
compare the accuracy for the implemented algorithms. User characteristics Application
is developod in such a way that its users are v Easy to use v Error free 20 v Minimal
training or no training v Patient regular monitor Assumption & Dependencies lt is
coiisidered that the dataset taken fulfils all the requirements.
4.2.3 Domain Requlrements:
This document is the only one that describes the requirements of the system. lt is meant
for the use by the developers, and will also be the bases for validating the final Heart
disease system. Any changes made to the mquirements in the future will have to go
thmugh a fonnal change approval process. User Requirements User can decide on the
prediction aceuracy to decide on which algorithm eau be used in real-time
prédictions. Non Functional Requirements ÿ Dataset collected should be .in the CSV
format ÿ .The column valides should be numerical values ÿ Training set and test set are
stored as CSV files ÿ Error.rates can be calculated for predicöon algorithms product.
42.4 Requlrements EfRcleocy:
Less time for predicting the Heart Disease Reliability: Maturity, fault tolerance
and recoverability. Portability: can the software easily be transferred to another
environment, including install ability.
How easy it is to understand, learn and operate the software system
Organizational Requirements: Do not block the some available ports thmugh the windows
firewall. Internet connection should be available Implementation Requirements The dataset
collection, internet connection to install related libraries. Engineering Standard
Requirements User Interfaces User interface is developed in python, which gets input
such stock symbol.
4.2.6 Hardware Interfaces:
Ethernet on the AS/400 supports TCP/IP, Advanced Peer-to-
PeerNetworking(APPN) and advanced program-to-progmm communications(APPC). ISDN
To connect AS/4OO to an Integrated Services Digital Network (ISDN) for faster, more
accurate data transmission. An ISDN is a public or private digital communications network
that can support data, fax, image, and other services over the same physica) interface.
We can use other protocols on ISDN, such as IDLC and X.25. Software Interfaces
Anaconda Navigator and Jupiter Notebook are used.
4.2.7 Operational Requirements:
s) Economic: The developed product is economic as it is not required any hardware
interface etc.
Environmental Statements of fact and assumptions that define the expectations of the
system in terms of mission objectives, environment, constraints, and measures of
effectiveness and suitability (MOh/MOS). The customers are those that perform the eight
primary functions of systems eagizseeriog, witb special emphasis on the operator as tbe
key customer.
bl Healtfi and 3afety: The software may be safety-critical. If so, there are issues
associated with its integrity level. The software may not be safety-critical although it
forms part of a safety-critical system.
For example, software may simply log transactions. If a system must be of a
high integrity level and if the software is shown to be of that integrity level, then the
hardware must be at least of the same integrity level.
» There is liBle point in producing 'perfect' code in some language if hardware and
system software (in widest sense) are not reliable. If a computer system ia to run
software of a high integrity level then that system should not at the same time
accommodate software of a lower integrity level.
• Systems with different requirements for safety levels must be separated. Otherwise,
the highest level of integrity required must be applied to all systems in the
same environment.
16
4 SYSTEM REQUIREMENTS
M.1 hardware Requirements
Processor above 500 MHz
Ram 4 GB
Hard Disk 4 GB
Input device Standard Keyboard and Mouse.
Output device : vGA and High Resolution
Monitor.
4.3.2 Software Requirements
Operating System
Windows 7 or higher
Programming
Python 3.6 and related libraries
Software
Anaconda Navigator and Jupiter Notebook.

4.4 SOFTWARE DESCRIPT'ION


4.4.1Python
Python is an interpreted higb-level programming language for general-
purpose programming. Created by Guido van Rossum and first released in 1991, Pydion
has a design philosophy that emphasizes code readability, notably using significant
whitespace. It provides constructs that enable clear programming on both small and
large scales. Pytbon features a dynamic type system and automatic memory
management.
It supports multiple programming paradigms, including object-oriented,
imperative, functional and procedural, and has a large and comprehensive standard
library. Python interpreters are available for many operating systems. C Python,
the reference implementation of Python, is open source software .and has a community-
based development model, as do nearly ail of its variant implementations. C Python
is.managed by the non-profit Python Software Foundation.
4.4.2Pandas
Pandas is an open-source Python Library providing high-performance
data manipulation and analysis tool using its powerful data structures. The name Pandas is
derived from the word Panel Data — an Econometrics from Multidimensional data.
In 2008, developer Wes McKinsey started developing pandas when in need of
high performance, flexible tool for analysis of data. Prior to Pandas, Python was majorly
used for data mining and preparation. It had very little contribution towards data
analysis. Pandas solved this problem.
Using Pandas, we can accomplish five typical steps in the processing and analysis of
17
data, regardless of the origin of data—load, prepare, manipulate, model, and analyze.
Python with

18
Pandas .is used in a wide range of fields including academic and commercial domains
including finance, economics, Statistics, analytics, etc.
Key Features of Pandas:

» Past and efficient Data Frame object witb default and customized indexing.
• Tools for loading data into in-memory data objects ftom different file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and sub setting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High perfonuance merging and joining of data.
• Time Series functionality.

A4J NumPy
NumPy is a general-purpose array-processing package. It provides a high-
performance multidimensional array object, and tools for working with these arrays. It is
the fundamental package for scientific computing with Python.
It contains various features including these important ones:

• A powerful N-dimensional array object


• Sophisticated (broadcasting) functions
• Tools for integrating C/CT-F and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities 24
• Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using
Numpy which allows NumPy to seamlessly and speedily integrate with a wide
variety of
databases.
4AA iklt-Leaen
• Simple and efficient tools for data mining and data ana1ysis\
• Accessible to everybody, and reusable in various contexts
• Built on NumPy, SciPy, and matplotlib
• Open source, commercially usable - BSD license

‹EA.5 Matplolt llb


• Matplotlib is a python library used to create 2D graphs and plots by using
python scri ts.
It has a module named pyplot which makes things easy for plotting by
providing feature to control line styles. font properties, formatting axes etc.
It supports a very wide variety of graphs and plots namely - histogram, bar
charts, power spectra, error charts etc.
4.4.6 Jupiter Notebooh
The Jupyier Notebook is an incredibly powcrfñl tool for interactively developing and
presenting data science projects.
A notebook integrates code and its output into a single document that combines
visualizations, narrative text, mathematical equations, and other rich media.
The Jupiter Notebook is as open-source web application that allows you to creaie
and share documents that contain live code, equations, visualizations and
narrative texl.

Uses include: data cleaning and transformation, numerical simulation. statistical


modeling, data visualization, machine learning, and much more.
The Notebook has support for over 40 programming languages, including Python, R,
Julia. and Scala.
Notebooks can be shared with others using email, Drop box, Git Hub and the Jupytcr
Notebook.
• Your code can produce rich, interactive output: HTML, images, videos, LATEX,
and custom MIME types.
Leverage big data tools, such as Apache Spark. from Python, R and Scala. Explore
that same data with pandas, scikit-learn. ggplot2, Tensor Flow.

19
CHAPTEłt 5

SYSTERIDESItIN
SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE

The below figure shows the process flow diagram or proposed work. First we collected the
Cleveland Heart Disease Database from UCI website then pre-processed the dataset and
select 16 important featums.

Heart Data - Features Algorithms


dlsease - preprocesslng selectlon Output
data base

Fig 5.1:System Architecture

For feature selection we used Recursive feature Elimination Algorithm using Chi2
method and get 16 top features. After that applied ANN and Logistic algorithm individually
and compute the accuracy. Finally, we used proposed Ensemble Voting method and compute
best method for diagnosis of heart ñsease.

5.2 MODULES

The entire work of this project is divided into 4 modules.

They are:

a. Data Pre-processing
b. Feature
c. Classification
d. Prediction
a. Data Pre-processing:
This file contains all the pre-processing functions needed to process all input documents and
texts. Fint we read the train, test and validation data files then petfonned some preprocessing
like tokenizing, stemming etc. There are some exploratory data analysis is performed like
response variable disWibution and data quality checks like null or missing values etc.
b. Feature:
Extraction In this file we have performed feature extraction and selection methods from sci-
kit learn python libraries. For feature selection, we have used methods like simple bag-of-
wolds and n-grains and then term frequency like tf-tdf weighting. We have also used

20
word2vec and POS tagging to ex.tract the features, though POS tagging and word2vec has
not been used at this point in the prqject.
c. Classlflcaflon:
Here .we have built a)1 the classifiers for the breast cancer diseases detection. The
extrarted features are fed into different elassifiers. We have used Naive-ba.yes,
Logistic Regression, Linear SVM, Stochastic gradient decent and Random forest
ciassifiers from skleam. Each of the extracted features was used in all of the classifiers.
Once fitting the model, we compared the f1 score and checked the confusion matrix.
After fitting all the classifiers. 2 best performing models were selected as candidate
models for heart diseases classification. We have perfomfed parameter
tuning by implementing GridSearchCV methods on these candidate models and chosen
best performing parameters for these classifier.
Finally selected rriodel was used for heart disease detection with the
probability of truth. In Addition to this, we have also extracted the top 50 features from our
term-frequency tfidf Vectorizer to see what words are most and important in each of
the classes.
We have also used Precision-Recall and learning curves to see how training and test
set performs when we increase the amount of data in our classifiers.
d. Prediction:
Our finally selected and best performing classified was algorithm which was
them saved on Riskwith name heat medel.sav. Once you close tttis repository,.this
model will be copied to user's machine and will be used by prediction.py file to classify
the Heart diseases
. It takes a news article as input from user then model is used for final classification
output that is shown to user along with probability of truth.
SP DATA FLOW DIAGRAM
The data fiow diagram (DFD) is one of the most important tools used by system
analysis. Onta fipw diagrams ate made.lip of number of symbols, which represents system
components. Most data flow modeling methods use four kinds of symbols: Processes,
Data stores, Data flows and external entities.
These symbols are used to represent four kinds of system components. Circles in DFD
represent processes. Data Flow represented by a thin line in the DFD and each data store
has a unique name and square or rectangle represents external .entities.
Fig 5.3: Data Flow Diagr:im Level I

23
8.4 UML DIAGRAh4S
$A. I Use-Case Diagram
A use case diagram is a diagram that shows a set of use cases and actors and
their relationships. A use case diagram is just a special kind of diagram and shares the
same common properties as do all other diagrams, i.e a name and graphical contents
that are a projection into a model. What disúnguishes a use case diagram from all
other kinds of
diagrams is its particular content.

Fig 5.4: Uæ case Diagram

An aeńvity diagram shows the flow from activity to activity. An activity is an ongoing
non- atomic exectiăon witbìn a state machine. An activity diagram is basically a projection
of the elements (ound in an activity gmph, a special case of a state machine in which all
or most states are activity states and in which all or most transiüons are triggered by
completion of activities iø the source.

34
Fig 5.5 Activity Diagram
5.4.3 Sequence Diagram
A sequence diagram is an interaction diagram that emphasizes the time ordering of
messages. A sequence diagram shows a set of objects and the messages sent and
received by those objects. The objects are typically named or anonymous instances of
classes. but may also mpmsent instances of other things, such as collaborations,
components, and nodes. We use sequence diagrams to illustrate the dynamic view of a
system.

Fig 5.6 Sequence Diagnim

2S
5.4.4 Class diagram

A Class diagram in the Unified Modeling Language (UML) is a iypc of static


structum diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among objects. It
provides a basic notation for other structure diagrams prescribed by UML. It is helpftil
for developers and other team members too.

User
+Datasets
cAttriDutes of Feat‹zres
+Dataset Collection() featureExtract@n[
+appIyAlg€srftPsms(}
perto mon<•e(}

Rg5.7C!assDi*gmm

26
CHAPTER 6
I IPLEMENTATIO
IMPLEMENTATION
6.I STEPS FOR IMPLEMENTATION
1. Install the mquired packages for building the ‘Passive Aggressive
Classified’.

2. Load the libraries into the workspace from the packages.


3. Read the input data set.
4. Nonnalize the given input daaset.
5. Divide this normalized data into lwo parts:
a.Train data
b. Test data (Note: 80% of Normalized data is used as Traiti data. 20% of
the Normalized data is used as Tesi data.)

62CODIWG
Sample code:

#importing modules
import pandas as pd
import numpy as np
import scabom as sns
#Rcading dataset(heart.csv)
df=pd.read_csv(’heart.csv’)

#priniing stariting instances of data set


df.head()

#visualizing how many persons have heartdisease based on gender using heaildiscasc
dataset sns.countplot(x-'sex'.hue-’iarget’,data=df)

su disease based on type of chest pain


ntplot(x='cp’,hu% rget'hd
t.hdf,

preprocessing data i.e. finding null values;


df.isna().sumt)
#if we have null values we drop them using following
command Pdf.dropna()
2
7
x=df{['age',’sex',&stbps','chol’,'cp')]
y=df[’target']
x.shapely.shnpe

f spliting data for trainig and testing


from skleam.model selection import train test_sp1ii
X_trainN_test,y_train,y test test split(x,y,test_size-0.3, random_state=l00)

Straining the daia and predicting accuracy using logistic regression


from sklearn.linear_inodel import LogisticRegression
logreg =LogisticRegression(class_weight='balanced’)
sd=logreg,fit(X train,y_train)
sd.score(X_train.y train)
la =
sd.predict(X_test)
Rd log - pd.DataFrame{da a)
#d log.rename(index-str, columns=(O:"y_log"})
logrep.score(X_test,y_test)

#testing against new samples


amp.array([63.1,3,145,233])
la.reshape( 1,5)
dcd.DataFmme(b)
df. shape

#predicting whether a person have heart disease or not against new sample
sd.prediet(df)
#creating pickle module
import pickle
pickle.dump(sd,open('heart1pk','wb'))
Straining the data and predicting accuracy using Random forest
from skleara.ensemble import RaadomForestClassifier
if' RandornForestClassifiOf{).fit(X_uairi,y_train)
rf.score(X_ñairi,y_tmizt)
28
CH.APTER 7

SYSTEAI TESTINt•
SYSTEM TESTING

7.1 WHITE BOX TESTING

It is testing of a software solution's internal structure, design. and coding. In this


type of testing, the code is visible to the tester. It focuses primarily on verifying the flow of
inputs and outputs through the application, improving design and usability. sWengtliening
security. White box testing is also known as Clear Box testing, Open Box testing,
Stnictutal testing, Transparent Box testing, Code-Based testing, and Glass Box testing. It
is usually performed by developers. It is one of two parts of the Box Testing approach
to software testing. Its counterpart, Blackbox resting. involves testing from an external or
end-user type perspective. On the other hand, Whitebox testing is based on thc intl€r
WOTkiRgS Of an application and revolves around intema) testing.

The term "WhiteBox" was used because of the see-through hex concept. The
clear box or WhiteBox name symbolizes the ability to see through the software's outer
shell (or "box") into its iiuier workings. Likewise, the "black box" in "Black Rox Testing"
symbolizes not being able to sec the inner workings of the software so that only the end-
user experience can be tested.

a. that do you verify in White Box Testing?

I.White box testing involves the testing of the software code for the following:

2. Internal security holes

3. Broken or poorly structured paths in the coding processes

4. The flow of specific inputs through the

code ñ.Expected output

o.The functionality of conditional loops

7.Testing ct each statement, object, and ftinction on an individual basis

The testing can be done at system, integration and unit levels of software development.

One of the basic goals of white box testing is to verify a working fiow for an
application. It
involves testing a series of predefined inputs against expected or desired outputs so that
when a specific input does not result in the expected output, you have encountered a bug.

b. How do you perform White Bos Testing?


To give you a simplified explanation of white box testing, we have divided it into
two basic steps. This is what testers do when testing an application using the white box
testing technique:

Step 1: Understand The Source Code

The first thing a tester will often do is learn and understand the source code of
the application. Since white box testing involves the testing of the inner workings
of an application, the tester must be very knowledgeable in the programming languages
used in the applications they arc testing. Also, the testing person must be highly aware of
secure coding practices. Security is often om of thc primary objectives of testing software.
The tester should be able to find sccui ity issues and prevent attacks from hackers and
naive users who might inject malicious code into the application either knowingly or
unknowingly.

Step 2: Create Test Cases And Execute

The second basic step to white box testing involves testing the application's
source code for proper flow and structure. One way is by writing more code to test the
application's source code. The tester will develop little tests for each process or series of
processes in the application. This method requires that the tester must have intimate
knowledge of the code and is often done by the developer. Other methods include
s , trial, and error
testing and the use of testing tools as we will explain further on in this article.

c. White Box Testing Techniques

A major White box testing technique is Code Coverage analysis. Code


Coverage analysis eliminates gaps in a Tesl Case suite. It identifies areas of a
program that are not exercised by a set of test cases. Ortce gaps are identified, you
create test cases to verify untested parts of the code, thereby increasing the quality of
the software product. There are automated tools available to perform Code coverage
analysis. Below arc a few coverage analysis techniques
at least once during the testing process of software
1.Statement Coverage:- This technique requires every possible stateirient in ihe code to be tested
engineering.
2. Branch Coverage:- This techniquechecks cvciy possible path fif-elsa and

other conditÎOnal loops) of a software application.

Apon from above, titüfs are mimer lis cove types such as Condition
MUltiplc Condition Coverage. Path Coverage, Function Coverage etc. Each
Coverage,
technique has its oWn merits and attempts to test fcover) all parts of software code.
Using Statement and Branch coverage you generally aitain 50-90% code covcrage
which is siifficient.

d. Types Of Whlte Box Testing


White box testing encompasses several resting types used to evaluate the usability of an
«pplication, size rcode or specific sofhvare package. Therearc listed below --

1. Unit Tesöng:
lt ÎS OÖ0f1 the first type of testing donc on an application. Unit Test_n is
performed on each unit or block of code as ii is dcvcloped. Unit Testing is essentially
dorto by thc prograrruner. As a software developer, you develop a few lines of code, a
sipgJe function or an objcct and test it to make sure it works before continuing Unit
Testing helps identité a
majority of bugs, early in the software development likcycie. Buts identificd in this
stnpc arc cheap#r and easy to fix.

1Testiztg for Meæo keaks:


Memory leaks are leading causes of slower running applications. A QA specialist who
is experienced at detecting memory leaks is essential in cases where you have a slow
running software application. Apart froiri above, a few testing types are part of both
black box and white box testing. They are listed as below

e.White Box Penetratlon Testlng:

In tbis testing, the lester/developer has full information of the application's source
code, detailed network information, IP addresses involved and all server
infonnation the application runs on. The aiin is to attack the code from scv¢ral
angles to expose security threats

f. White Bon Mutation Testlng:

Mutation testing is often used to discover the best coding techniques to use for
expanding a sob solution.
31
g. White Box Tentlng Tools
Below is a list of top white box testing tools.

• Parasofi Jtest
» EclEmma
o NUnit

• HTMLI Unit

b. Advantages Of Whlte Box Testing

• Code optimization by finding hidden errors.

• White box tests cases can be easily aiitolratod.

• Testing is more thorough as all code paths are usually covered.

» Testing can start early in SDLC even if GUI is not available.

i. Disadvantages Of Whlte Box Testing

• White bon testing can be quite complex and expensive.

» Developers who usually execute white box test cases detest it. The white box
testing by developers is not detailed can load to production uirors.
• White box testing requires professiona) resources, with a detailed understanding
of programming and implementation.
• White-box testing is time-consuming, bigger programming applications take the
time to test fully.

Ending Notes:
White box testing can be quitu complex. Thu complexity involved has a lot to do with
the application being tested. A snail application that performs a single simple
opcrntion could be white box tested in few minutes, while larger programming
applications take days, weeks and even longer to fully test. White box testing should be
done on a software application as it is being developed after it is written and again after
each modification

33
7.2BIackBoxTesAng

a. TYhat Is Black Box TesHng


It is defined as a testing technique in which functionality of the Application under
Test(AUT) is tested without looking at the internal code structure, implementation details
and knowledge of internal paths of the software. This type of testing is based entirely on
software requirements and specifications. In ‘BlackBox’ Testing we just focus on inputs and
output of the software system without bothering about internal knowledge of the software
program.

The Black-Bon can be any software .system you want to test. For Example, an operating
system like Windows,.a website like Google, a database like Oracle or even your own
custom application Under Black Box Testing, you can test these applications by just
focusing on the inputs and outputs without knowing their internal code implementation.

b. How to do.Bfackboz °gestfog


• Here are the generic steps followed to carry out any type of Black Box Testing.
• Initially, the requirements and specifications or the system are examined.
• Tester chooses valid inputs (positive test scenario) to check whether SUT processes
them correctly. Also, some invalid inputs (negative test scenario) are chosen to
verify that the SUT is able to detect tliem.
• Tester determines expected outputs for all those inputs.
• Software tester constructs test cases with the selected inputs.
• The test cases are executed.
• Software tester compares the actual outputs with the expected outputs.
• Defects if any are fixed and re-tested.

c. Types of Black Box Testing

There are mans types of Black Box Tésting but the following are the prominent ones :-
1.Functional testing - This black box testing type is related to the functional requirements
of a system; it is done by software testers.
2. Non-functional testing - This type of black box testing is not related to testing of specific
functionality, but non-functional requirements such.as performance, scalability, usability.

33
3. Regression testing - Regression Testing is done after code fixes, upgrades or any
other system maintenance to check the new code has not affected the existing code.
d. Tools used for Black Box Testing:
Tools used for Black box testing largely depends on the type of black box testing you are
doing.

+ Fot Functional/ Regression Tests you can use - QTP, Scleiiium


• For Non-Functional Tests. you can use - LoadRunner, JMeter.

e. Blsck Box Testing Tectinlques


Following are the prominent Test strategy amongst the many used in Black box Testing.

1.Equivalence Testing

2.Boundary Valtie Testing

3. Decision Table TesDng

1. Equivalence Class Testing:

It is used to minimize the number of possible test cases to an optimum level while
maintains reasonable test coverage.

1 Boundary Value Testing:


Boundary value testing is focused on the values at boundaries. This technique determines
whether a certain range of values are acceptable by the system or not. It is very
useful in reducing the number of test cases. It is most suitable for the systems where an
input is within certain ranges.

3. Decision Table Testing:


A decision table puts causes and their effects in a matrix. There is a unique combination
in each column.
f. Btack Box Testing and Software Development Life Cycle fSDLC)
Black box testing has its own life cycle called Software Testing Life Cycle ISTLC) and it
is relative io every stage of Software Development Life Cycle of Soffwam Engineering.

1. Requirement - Tbis is tbe initial stage of SDLC and in this stage, a


requirement is gathered. Software testers also take part in this stage.

2. Test Planning & Analysis


Testing Types applicable to the project are determined. A Test plan is created which
determines possible project risks and their mitigation.

In this stage Test cases/scripts are created on the basis of software mquireinent documents

•t.Test Execution
ln this stage Test Cases prepared are executed. Bugs if any arc fixed and re-tested.

35
CHAPTER 8

SCREENSHOTS
8, SCREEN SHOTS

Fig 8.1:Anaconda Pi ompt

8.2 Home Screen:

Fig S.2: ] lan c Scrccn for I Icon Atfack Dctntion


CHAPTER-9

CONCLUSION
CONCLUSION

In this project, we introduce about the heart disease prediction system with different classificr
techniques for thc prcdiction of heart disease. The techniques arc Random Forest and Logistic
Rcprcssion: we hav‹: analyzc‹1 that the Random Forest has better accuracy as
compared! to Logistic Regression. Ottr ptirpt›sc is to iinprovc llie performance of the
Random Forcsi by removing unnecessary and irrelevant attributes from tlic dataset and
only picking ihosc that arc most infonnative for the classification task.

38
CHAPTER 10

FUTURE SCOPE
FUTURE SCOPE

As illustmtcd before ihc system can be used as a clinical assistant fOr any cliniCianS.
The discasc prediction through the risk factors can be hosted online and hence any internet
users can access the system through a web browser and understand the risk of heart
disease. The proposed minded can be implemented for any mal iiiric application .Using the
proposed model other type of hcari disease also can be determined. Diffcmnt heart
diseases as rlictiinaiie mean disease, hypertensive heart disease, ischciriic heart dissase,
cardiovascular disease and inflammatory heart disease can be identified.

Other health care systems can be formulated using this proposed model in order to
identify ihc diseases in the early stage. The proposed model requires an efficient
processor with good memory configuration to implement it in real time. The proposed
model has oxide ama of application like grid computing, cloud computing, robotic
modeling, etc. To increase the pcrforinancc of our classificr in future, we will »•ork on
ensembling Evo algoritluns callcd Random Forest and Adaboost. By cnsemblinp these
Evo algorithms we viII achieve hishperformance.

39
CHAPTER 11

REFERENCES
REFERENCES
[I] P.K. Anooj. —Clinical decision support system: Risk level prediction of heart discase
using weighted fuzzy rulesl; Journal of King Sand University — Coinputcr and
Infonnation Sciences (2012) 24, 27—40. Compiiter Science & Infonnaiion Technology
(US & lT) 59

[2] Nidhi Bliatla. Kiran Jyoti”An Analysis of Heart Disease Prediction using Different
Data Mining Techniques”.International Journal of Engineering Rcscarch Technology

[3] Jyoti Soni Ujma Ansari Dipesli Sharma, Stiniia Soni. “Prediciivc Data Mininp for
Medical Diapnosis: An Overvieix• of Heart Discese Prcdiction”.

[4] Chaitrali S. Dangare Siilablia S. Apie, liiiproved Sttidy of Heart Disease


Prediction System using Data Mining Classification Techniques" International Journal
of Computer Applications (0975 — SSS)

[5] Dame Bertram. Amy Voida. Saul Greenberg, Kobcrt \Valker. “Communication,
Collaboration, and Bugs: The Social Nature of Issue Tracking in Small, Collocated
Teams”.

[6]M. Anbarasi. E. Anupriya. N.Ch.S.N.lyenpar. —Enhanced Prediction of Heart Disease


»'iih Feature Subset Selection tising Cicnctic AlgorillH8i; International Journal of Engineering
Science and Technology, Vol. 2(10), 2010.

[7] Ankita Dewan, Meglina Shanna.” Prediction of Hcart Disease Using a Hybrid
Technique in Data Mininc Classification". 2nd International Conference on Computing for
Sustainable Global Development IEEE 2015 pp 704-706. [2].

[S] R. Alizadehsani, J. Habibi, B. Baliadorian, H. Masliayekhi, A. Gliandcharioiin. R.


Boghrati, et al., “Diagnosis of coronary artcrics stenosis using data mining,” J Med
Signals Sens, vol. 2, pp. 153-9, Jul 2012.

[9] M Akhil Jabbar. BL Dcckshatulu, Priti Chandra.” Heart discasc classification using
nearest neighbor classified with feature subset selection”. Anale. Scria Iiifomiaiica, 1.1.
2013
[10] Shadab Adam Pattekari and Asiiia Parvenu.” PRÉDICTION SYSTEM FOR
HEART DISEASE USING NAIVE BAYES”, Iinternational Journal of Advanced
Coinpiitcr and MatliclTiaÎÎCOl SCÎCflCcs ISSN 2230-9624, Vol 3, Issue 3, 2012, pp 290-294.

[l1]C. Kalaisclvi, PhD. “Diagnosis of Heart Discasc Using K-NcarestNeiglibor


Algorilliin of Data Miliinp”, IEEE, 20 l G

[12]Keerthana T. K., “ Heart Disease Ptcdiction System using Data Mining


Method”. Intcmational Journal of Enpinecrinp Trends ated Tec hnology”. May
2017.

[13] Data Mining Concepts and Tccliniqucs, Jiawci Han and Micheline Kainbcr, ELSEVIER.

Animcsli H:cru, Arkomita Mukherjee, Ainit Gupia, Prcdiction Using Machine Leaniing and
Dattt Minins Jiily2017, pp.2137-2159.

41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy