0% found this document useful (0 votes)
27 views3 pages

NB 7

Uploaded by

patelshruti522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views3 pages

NB 7

Uploaded by

patelshruti522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Rucha Shinde et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol.

6 (1) , 2015, 637-639

An Intelligent Heart Disease Prediction System


Using K-Means Clustering and Naïve Bayes
Algorithm
Rucha Shinde(1), Sandhya Arjun(2), Priyanka Patil (3),Prof. Jaishree Waghmare(4)
Trinity College of Engineering & Research, Pune

Abstract—Nowadays people work on computers for hours


and hours they don’t have time to take care of themselves. II. LITERATURE REVIEW
Due to hectic schedules and consumption of junk food it [1] Intelligent heart disease prediction system using
affects the health of people and mainly heart. So to we are
data mining techniques:
implementing an heart disease prediction system using data
mining technique Naïve Bayes and k-means clustering In this paper heart disease prediction is done using data
algorithm. It is the combination of both the algorithms. This mining techniques such as decision trees, neural network
paper gives an overview for the same. It helps in predicting and naïve bayes. This system answers “what if ” query. It is
the heart disease using various attributes and it predicts the implement on .net platform. It is used for heart disease
output as in the prediction form. For grouping of various prediction.
attributes it uses k-means algorithm and for predicting it uses [2] An empirical study on applying data mining
naïve bayes algorithm. techniques for analysis and prediction of heart disease:
It is found that health environment is poor in extracting
Index Terms —Data mining, Comma separated files, naïve
bayes, k-means algorithm, heart disease. knowledge so in this paper data mining techniques are
applied . this paper deals with application of data mining.
[3] Prediction system for heart disease using naïve
I. INTRODUCTION bayes mining:
The practice of examining large preexisting data bases It is web-based classification. It retrives hidden data
in order to generate new information. It coverts raw data from database. It compare the value with trained dataset. In
into useful information. It analyze the data for relationships this paper it is mentioned that because of this system the
that have not previously been discovered. [1] treatment cost are reduced.
The steps of data mining are: Data cleaning, data [4] Decision support in heart disease prediction
integration, data selection, data transformation, data mining, system using naïve mining:
pattern evaluation and knowledge representation. This research developed using data mining techniques
Medical data mining is a domain of lot of imprecision mainly naïve bayes. It takes input as the patients attributes.
and uncertainty. The clinical decisions are usually based on It helps trained nurses and medical students to treat patients.
the doctors intuition. Therefore this may lead to disastrous [5] Intelligent and effective heart attack prediction
consequences. Due to this there are many errors in the system using data mining:
clinical decisions and it results in excessive medical costs. In this paper k-means clustering is used. This system
[1] capable of predicting heart disease.
Serialization is also used in this system. It converts the
data objects into streams of bytes and stores it into database.
III. BLOCK DIAGRAM
The following block diagram represents the step by step
implementation of the heart disease prediction system. The
block diagram consist of two sets first one is the training
set and the other one is prediction. In training set firstly the
input is taken i.e the patients attributes then a dataset is
being formed. After that dataset is given labels according
to the name of the attributes. Then on the dataset
transformation is done means the attributes are separated
through comma separated vector i.e C.S.V files. After that
on these dataset K-Means clustering algorithm is applied,
here the grouping of the attributes is done and the attributes
are added according to their groups. After this model is
ready to apply prediction algorithm on it.
Figure1:-Data Mining Process In prediction system, for prediction we used naïve
bayes algorithm. Naïve bayes basically applies probability

www.ijcsit.com 637
Rucha Shinde et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (1) , 2015, 637-639

concept. It integrates on each and every attribute and gives Requirements of clustering in data mining:-
the result. The output which we would get will be 1) Deals with different types of attributes.
prediction the person is having heart disease or he is likely 2) Deals with noise data
to have heart disease. This system will help him to take the 3) It requires minimum knowledge to determine input
preventive measures from not getting the disease. parameter.
4) Usability
5) More dimensionality

K-MEANS CLUSTERING
K-means is most simplest learning algorithm to solve the
clustering problems. The process is simple and easy, it
classifies given data set into certain number of clusters.
It defines k centriods for each cluster. They must be
placed as much as possible far away from each other. Then
take each point belonging to given data set and relate into
the nearest centriod. If no point is pending then an group
age is done. Then we re-calculate k new centroid for the
cluster resulting from previous steps. When we get the k
centroid a new binding is to be done between sane data
points and nearest centroid. A loop is been generated
because of this loop key centriod change the location step
by step until no more changes are done.[4]
Fig. 2:Block diagram
The advantages of k means clustering algorithm are
simplicity and speed.
IV. COMMA SEPARATED VALUES(CSV) Algorithm:-
1) Select k center from the problem(random)
The full form of CSV is Comma Separated Values. In 2) Divide data into k clusters by grouping points.
older days CSV is originally named as CSL i.e Comma 3) Calculate the mean of k cluster to find new centers.
Separated List.[2][3] 4) Repeat steps 2 and 3 until centers do not change.
A CSV file stores plain text form into tabular data. Plain
test contains character with no data as binary number. A In this system we mainly used clustering for grouping
CSV file contains records which are separated by some the attributes. As we take almost 10 attributes such as age
character or strings mainly by comma or tab. CSV refers to In this system we take various attributes such as age,
large family of formats. obesity, gender, cholesterol, smoker ,blood pressure, chest
It is supported by consumer and business applications. It pain ,blood sugar, ECG results etc. this attributes are
is mainly used when user needs to transfer information grouped using K-Means clustering algorithm
from a database programs to a spreadsheet that uses a Eg:- If we took an attribute such as age and we
completely different format.[2][3] considered the age of the person between 0-100. After
In CSV files records are divided into fields separated by applying K-means algorithm on this dataset of age it will
delimiters. They work both with UNICODE and ASCII. find the centriod and divide it into groups. It calculate the
CSV files translates one character set to another. mean. Here, age will be divided into 3 groups such as from
CSV files cannot represent object oriented database 0-30,31-60,61-100.
because CSV records excepted to have same structures. It will give them values such as
CSV files are also called as flat files.[2][3] 0-30=0
In this system we take various attributes such as age, 31-60=1
obesity, gender, cholesterol, smoker ,blood pressure, chest 61-100=2
pain ,blood sugar, ECG results etc. As we take this input For gender attribute it will divide into groups such as
one by one this inputs are separated using CSV files. This Male=0
inputs are converted into a tabular format and are separated Female=1
using comma. K-means will be applied on each and every attribute
Because of CSV files data appear in a sophisticated and mentioned above.
in well- represented manner. After that the attributes and their values will be added in a
dataset accordingly. Then the model is being ready for
V. K-MEANS CLUSTERING ALGORITHM prediction.
Clustering is the process of grouping of data objects that VI. NAÏVE BAYES ALGORITHM
are same to one other within the cluster. They even
grouped dissimlar objects into another cluster. It is also Naïve Bayes classifier is based on Bayes theorem. It
called as data segmentation in some applications because it has strong independence assumption. It is also known as
divides large data set into groups according to the independent feature model.
similarities.[4]

www.ijcsit.com 638
Rucha Shinde et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (1) , 2015, 637-639

It assumes the presence or absence of a particular VIII. CONCLUSION


feature of a class is unrelated to the presence or absence of In this paper we are proposing heart disease prediction
any other feature in the given class. system using naïve bayes and k-means clustering. We are
Naïve bayes classifier can be trained in supervised using k-means clustering for increasing the efficiency of
learning setting. It uses the method of maximum similarity. the output. This is the most effective model to predict
It has been worked in complex real world situation. It patients with heart disease. This model could answer
requires small amount of training data. It estimates complex queries, each with its own strength with respect to
parameters for classification. Only the variance of variable ease of model interpretation, access to detailed
need to be determined for each class not the entire information and accuracy
matrix.[5][6]
Naïve bayes is mainly used when the inputs are high. It
gives ouput in more sophisticated form. The probability of REFERENCES
each input attribute is shown from the predictable state. [1] Sellappan Palaniappan, Rafiah Awang “Intelligent Heart Disease
Machine learning and data mining methods are based on Prediction System Using Data Mining Techniques”Department of
naïve bayes classification. Information Technology Malaysia University of Science and
Technology Block C, Kelana Square, Jalan SS7/26 Kelana Jaya,
47301 Petaling Jaya, Selangor, Malaysia .
Bayes theorem:- [2] "CSV File Reading and Writing" (http:/ / docs. python. org/ library/
csv. html). . Retrieved July 24, 2011. "is no "CSV standard"”
P(H|X) = P(X|H) P(H) [3] Y. Shafranovich. "Common Format and MIME Type for Comma-
Separated Values (CSV) Files" (http:/ / tools. ietf. org/ html/
rfc4180) Retrieved September 12, 2011.
P(X) [4] home.deib.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html
Where “A tutorial on clustering algorithms”.
P(H|X ) is posterior probability of H conditioned on X [5] Shadab Adam Pattekari and Asma Parveen “Prediction System
P(X|H) is posterior probability of X conditioned on H For Heart Disease Using Naïve Bayes” International Journal of
P(H)is prior probability of H Advanced Computer and Mathematical Sciences ISSN 2230-9624.
Vol 3, Issue 3, 2012, pp 290-294.
P(X) is prior probability of X [6] Mrs.G.Subbalakshmi (M.Tech), Mr. K. Ramesh M.Tech, Asst.
Professor Mr. M. Chinna Rao M.Tech,(Ph.D.) Asst. Professor,
Naïve bayes will basically predict the output “Decision Support in Heart Disease Prediction System using Naive
Bayes” G.Subbalakshmi et al. / Indian Journal of Computer Science
whether the patient will have chances of getting the heart
and Engineering (IJCSE)2011.
disease or not.
[7] Jesmin Nahar, Tasadduq Imama, Kevin S. Tickle, Yi-Ping
The model dataset which we get after applying K-Means Phoebe Chen “Association rule mining to detect factors which
algorithm will compared the values of dataset with a contribute to heart disease in males and females” Expert Systems
trained dataset. It will apply the bayes theorem and the with Applications 40 (2013) 1086–1093.
probability will be obtained whether the patient will have [8] Oleg Yu. Atkov (MD, PhD), Svetlana G. Gorokhova (MD, PhD),
heart disease or not.[5][6] Alexandr G. Sboev (PhD), Eduard V. Generozov (PhD), Elena V.
Muraseyeva (MD, PhD), Svetlana Y. Moroshkina,Nadezhda N.
Cherniy “Coronary heart disease diagnosis by artificial neural
VII. INPUT ATTRIBUTES networks including genetic polymorphisms and clinical parameters”
1) Age Journal of Cardiology (2012) 59, 190—194.
2) Gender [9] Shantakumar B.Patil Y.S.Kumaraswamy “Intelligent and Effective
Heart Attack Prediction System Using Data Mining and Artificial
3) Obesity Neural Network” European Journal of Scientific Research ISSN
4) Smoking 1450-216X Vol.31 No.4 (2009), pp.642-656.
5) Electrographic result [10] Sivagowry, Dr. Durairaj. M2 and Persia. “An Empirical Study on
6) Heart rate applying Data Mining Techniques for the Analysis and Prediction of
7) Chest pain Heart Disease” 2013.
8) Cholesterol
9) Blood pressure
10) Blood sugar

www.ijcsit.com 639

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy