NB 7
NB 7
www.ijcsit.com 637
Rucha Shinde et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (1) , 2015, 637-639
concept. It integrates on each and every attribute and gives Requirements of clustering in data mining:-
the result. The output which we would get will be 1) Deals with different types of attributes.
prediction the person is having heart disease or he is likely 2) Deals with noise data
to have heart disease. This system will help him to take the 3) It requires minimum knowledge to determine input
preventive measures from not getting the disease. parameter.
4) Usability
5) More dimensionality
K-MEANS CLUSTERING
K-means is most simplest learning algorithm to solve the
clustering problems. The process is simple and easy, it
classifies given data set into certain number of clusters.
It defines k centriods for each cluster. They must be
placed as much as possible far away from each other. Then
take each point belonging to given data set and relate into
the nearest centriod. If no point is pending then an group
age is done. Then we re-calculate k new centroid for the
cluster resulting from previous steps. When we get the k
centroid a new binding is to be done between sane data
points and nearest centroid. A loop is been generated
because of this loop key centriod change the location step
by step until no more changes are done.[4]
Fig. 2:Block diagram
The advantages of k means clustering algorithm are
simplicity and speed.
IV. COMMA SEPARATED VALUES(CSV) Algorithm:-
1) Select k center from the problem(random)
The full form of CSV is Comma Separated Values. In 2) Divide data into k clusters by grouping points.
older days CSV is originally named as CSL i.e Comma 3) Calculate the mean of k cluster to find new centers.
Separated List.[2][3] 4) Repeat steps 2 and 3 until centers do not change.
A CSV file stores plain text form into tabular data. Plain
test contains character with no data as binary number. A In this system we mainly used clustering for grouping
CSV file contains records which are separated by some the attributes. As we take almost 10 attributes such as age
character or strings mainly by comma or tab. CSV refers to In this system we take various attributes such as age,
large family of formats. obesity, gender, cholesterol, smoker ,blood pressure, chest
It is supported by consumer and business applications. It pain ,blood sugar, ECG results etc. this attributes are
is mainly used when user needs to transfer information grouped using K-Means clustering algorithm
from a database programs to a spreadsheet that uses a Eg:- If we took an attribute such as age and we
completely different format.[2][3] considered the age of the person between 0-100. After
In CSV files records are divided into fields separated by applying K-means algorithm on this dataset of age it will
delimiters. They work both with UNICODE and ASCII. find the centriod and divide it into groups. It calculate the
CSV files translates one character set to another. mean. Here, age will be divided into 3 groups such as from
CSV files cannot represent object oriented database 0-30,31-60,61-100.
because CSV records excepted to have same structures. It will give them values such as
CSV files are also called as flat files.[2][3] 0-30=0
In this system we take various attributes such as age, 31-60=1
obesity, gender, cholesterol, smoker ,blood pressure, chest 61-100=2
pain ,blood sugar, ECG results etc. As we take this input For gender attribute it will divide into groups such as
one by one this inputs are separated using CSV files. This Male=0
inputs are converted into a tabular format and are separated Female=1
using comma. K-means will be applied on each and every attribute
Because of CSV files data appear in a sophisticated and mentioned above.
in well- represented manner. After that the attributes and their values will be added in a
dataset accordingly. Then the model is being ready for
V. K-MEANS CLUSTERING ALGORITHM prediction.
Clustering is the process of grouping of data objects that VI. NAÏVE BAYES ALGORITHM
are same to one other within the cluster. They even
grouped dissimlar objects into another cluster. It is also Naïve Bayes classifier is based on Bayes theorem. It
called as data segmentation in some applications because it has strong independence assumption. It is also known as
divides large data set into groups according to the independent feature model.
similarities.[4]
www.ijcsit.com 638
Rucha Shinde et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (1) , 2015, 637-639
www.ijcsit.com 639