WEKA

1
A PROJECT REPORT
On
“ERROR PREDICTION USING W.E.K.A”
Submitted to
KIIT Deemed to be University
In Partial Fulfilment of the Requirement for the Award of
BACHELOR’S DEGREE
IN COMPUTER SCIENCE
AND ENGINEERING
BY
ADITYA AGARWAL 1515003

SAGAR GOEL 1505502
PIYUSH GANGLE 1515026
PALAK THAKUR 1505043
UNDER THE
GUIDANCE OF Dr.
Ajay Kumar Jena
SCHOOL OF COMPUTER ENGINEERING

KALINGA INSTITUTE OF INDUSTRIAL
TECHNOLOGY
BHUBANESWAE, ODISHA -751024
02 April 2019
2
Acknowledgement
We would like to thank our mentor Mr. A.K. Jena for providing constant support and
giving us a broader picture of the whole scenario without which the completion of the
project would not have been possible. His constant endeavor and guidance has helped us
complete the project within the stipulated time. Last but not the least, we would like to
thank our fellow project mates for helping us out with all the problems that we faced and
making our experience an enjoyable one.
4
KIIT Deemed to be University

School of Computer Engineering
Bhubaneswar, ODISHA 751024
CERTIFICATE
This is certify that the project entitled
“ERROR PREDICTION USING W.E.K.A.’’
Submitted by
ADITYA AGARWAL 1515003

SAGAR GOEL 1505502
PIYUSH GANGLE 1515026
PALAK THAKUR 1505043
is a record of bonafide work carried out by them, in the partial fulfilment of the requirement for
the award of Degree of Bachelor of Engineering (Computer Sci- ence & Engineering) at KIIT
Deemed to be university, Bhubaneswar. This work is done during year 2018-2019, under our
guidance.
Date: 02/03/2019
(Dr. Ajay Kumar Jena)

Project Guide
5
Declaration
We declare that this written submission represents our best ideas in our own words and
where others ideas or words have been included, we have adequately cited and referenced
the original sources. We also declare that we have adhered to all principles of academic
honesty and integrity and have not misinterpreted or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any violation of the above will
cause for disciplinary action by the Institute and can also evoke penal action from the
sources and which have thus not been properly cited or from whom proper permission has
not been taken when needed.
Aditya Agarwal 1515003
Sagar Goel 1505502
Palak Thakur 1505043
Piyush Gangle 1515026
Date: 02 April,2019
6
ABSTRACT
In this project we have attempted to analyze different Medical Datasets and The Pre-Poll
Survey of 2012 U.S. Presidential Elections by applying different classification and clustering
algorithms to better understand the nature of the data and their relationships among themselves
and to determine the most optimal model to be used for providing most accurate predictions.
We have used WEKA workbench to perform all the operations on the datasets. We have mainly
used the Graphical User Interface called the Explorer for classifying, clustering and visualizing
the datasets.
7
CONTENTS:
CHAPTER -1: Introduction

1.1 Overview 9
1.2 Objective 10
1.3 System Requirements 10
CHAPTER -2: Review Of Literature 11
CHAPTER -3: Machine Learning

3.1 Vision of Machine Learning 12
3.1.1 Applications 12
3.2 Data 13
3.3 Basic algorithms 15
CHAPTER -4: Classification
4.1 Introduction 16
4.2 Solution to classification problem 16
4.3 Decision tree 17
4.3.1 Design issues in decision trees 18
4.4 Methods for expressing attribute 19
CHAPTER-5: Clustering
5.1 Introduction to clustering 20
5.2 Cluster Analysis 22
5.3 Types of clustering 22
5.4Clustering techniques 23
5.4.1 K-means 24
5.4.1.1 Basic K-means algorithm 25
8
CHAPTER -6: WEKA

6.1 Introduction to WEKA tool 26
6.2 Explorer: Pre-processing the data 26
6.2.1 Explorer: Building classifier 27
6.2.2 Explorer: Clustering data 28
6.3Visualization 29
CHAPTER-7: Results & Discussion

7.1 Outcome 30
7.2 Future Works 35
7.3 Summary 35
7.4 Conclusion 36
7.5 References 37
9
List of Figures:
TABLE 4.1: confusion matrix for a 2 class problem

ALGORITHM 5.4.1.1: Basic K-means Algorithm
FIGURE 6.1: WEKA Explorer
FIGURE 6.2: Classify Tab of Explorer
FIGURE 6.3: Cluster Tab of Explorer
FIGURE 6.4: Applying K-means in clustering
FIGURE 6.5: Visualization Tab
FIGURE 7.1: Applying classification in vertebral column 2c dataset (ZeroR)
FIGURE 7.2: Applying classification in vertebral column 2c dataset (RepTree)
FIGURE 7.3: Applying classification in vertebral column 2c dataset (J48)
FIGURE 7.4: Applying classification in vertebral column 2c dataset (OneR)
FIGURE 7.5: Applying classification in vertebral column 2c dataset (Naïve Bayes)
10
CHAPTER - 1
INTRODUCTION
1.1 OVERVIEW
Traditional analytics tools are not suited for capturing the full value of big data.
The volume of the big data is too large for comprehensive analysis, and the range of potential
correlations and relationships between disparate data sources i.e. from back end customer
databases to live web based clickstreams i.e. are too great for any analyst to test all hypotheses
and derive all the value buried in the data.
Basic analytical methods used in business intelligence and enterprise reporting tools reduce to
reporting sums, counts, simple averages and running SQL queries. Online analytical processing
is merely a systematized extension of these basic analytics that still rely on a human to direct
activities specify what should be calculated.
Machine learning is ideal for exploiting the opportunities hidden in big data. It delivers on the
promise of extracting value from big and disparate data sources with far less reliance on human
direction. It is data driven and runs at machine scale. Machine Learning is well suited to the
complexity of dealing with disparate data sources and the huge variety of variables and amounts
of data involved. And unlike traditional analysis, machine learning thrives on growing datasets.
The more data fed into a machine learning system, the more it can learn and apply the results to
higher quality insights.
Freed from the limitations of human scale thinking and analysis, machine learning is able to
discover and display the patterns buried in the data.
The team at Waikato has incorporated several standard Machine Learning techniques into software
“Workbench” abbreviated WEKA (Waikato Environment for Knowledge Analysis). With the use
of WEKA, a specialist in a particular field is able to use ML and derive useful knowledge from
databases that are far too large to be analyzed by hand.
The main objectives of WEKA are to:

 Make Machine Learning (ML) techniques generally available;
 Apply them to practical problems as in agriculture, medical, robotics etc. ;
 Develop new machine learning algorithms;
 Design a theoretical framework for the field.
11
1.2 OBJECTIVE
The main purpose of descriptive analysis is to better understand the large and complex datasets,
find out the errors, most efficient algorithms for classification and clustering for future use in
machine learning and thereby using the knowledge of machine learning for implementation in
Artificial Neural Networking (ANN).
1.3 SYSTEM REQUIREMENTS
To implement the software WEKA, we downloaded it from the website

http://www.cs.waikato.ac.nz/~ml/weka/index.html . WEKA is written in Java so it is platform
independent workbench. WEKA can run on the Operating Systems like Windows, OS X, and
Linux. It can run on platforms like IA-32, x86-64; Java SE. It is mainly used for Machine
Learning Process which can further can be used for predictive analysis. We have
worked on WEKA’s latest version i.e. 3.8.2.
12
CHAPTER – 2
REVIEW OF LITERATURE
Similar research and projects have been done on Machine Learning using WEKA tool some of
which are, [1] Ramamohan Y. et. al. (2012) Data mining tools are used for prediction in the
business and make fruitful decisions. This paper gives the brief overview for different data
mining tools like Weka, Tanagra, Rapid Miner, DBMiner, Witness Miner, and Orange.
[2] Bhaise R. B. et. al. (2013) Education Data Mining is the main theme of this paper, for better
student’s education. For this, the author used K-Means or Clustering techniques on the sample
data. This technique is used to analyze the data from different dimensions and categorize the
data. They made clusters according to the student’s performance in the examination. The
information generated after implementing mining technique is very much useful for teacher as
well as student.
[3] Borkar S. and Rjeswari K. (2013) Association rule mining is useful to evaluate student
performance in the study. In this paper, for data analysis Weka tool is used. The main goal of this
paper is predicting student performance in the university exam on basis of the criteria internal
exam, assignment, attendance, etc. This paper concluded that the result of university result will
be improved of the poor students by giving extra efforts in their unit test, attendance, assignment
and graduation.
[4] Chaurasia V. and Pal S. (2013) this paper gives the survey information of different data
mining techniques for medical person for decision making. From this, the doctors can predict the
presence of heart disease. This paper used Naive Bayes, J48 Decision Tree, Bagging techniques
in the field of heart disease diagnosis. As a result, the bagging algorithm is better from others
because it gives human readable classification rules.
13
CHAPTER – 3
MACHINE LEARNING
Over the past two decades Machine Learning has become one of the main-stays of information
technology and with that, a rather central, albeit usually hidden, part of our life. With the ever
increasing amounts of data becoming available there is good reason to believe that smart data
analysis will become even more pervasive as a necessary ingredient for technological progress.
The purpose of this chapter is to provide the reader with an overview over the vast range of
applications which have at their heart a machine learning problem and to bring some degree of
order to the zoo of problems. After that, we will discuss some basic tools from statistics and
probability theory, since they form the language in which many machine learning problems must
be phrased to become amenable to solving.
3.1 Vision of Machine Learning
Machine learning can appear in many guises. We now discuss a number of applications, the
types of data they deal with, and finally, we formalize the problems in a somewhat more stylized
fashion. The latter is key if we want to avoid reinventing the wheel for every new application.
Instead, much of the art of machine learning is to reduce a range of fairly disparate problems to a
set of fairly narrow prototypes. Much of the science of machine learning is then to solve those
problems and provide good guarantees for the solutions.
3.1.1 Applications
Most readers will be familiar with the concept of web page ranking. That is, the process of
submitting a query to a search engine, which then finds webpages relevant to the query and
which returns them in their order of relevance. A rather related application is collaborative
filtering. Internet bookstores such as Amazon, or video rental sites such as Netflix use this
information extensively to entice users to purchase additional goods (or rent more movies). The
problem is quite similar to the one of web page ranking. As before, we want to obtain a sorted
list (in this case of articles). The key difference is that an explicit query is missing and instead we
can only use past purchase and viewing decisions of the user to predict future viewing and
purchase habits. The key side information here are the decisions made by similar users, hence the
collaborative nature of the process.
An equally ill-defined problem is that of automatic translation of documents. At one extreme,

we could aim at fully understanding a text before translating it using a curated set of rules crafted
by a computational linguist well versed in the two languages we would like to translate. This is a
14
rather arduous task, in particular given that text is not always grammatically correct, nor is the
document understanding part itself a trivial one. Instead, we could simply use examples of
translated documents, such as the proceedings of the Canadian parliament or other multilingual
entities (United Nations, European Union, Switzerland) to learn how to translate between the two
languages. In other words, we could use examples of translations to learn how to translate. This
machine learning approach proved quite successful.
Many security applications, e.g. for access control, use face recognition as one of its
components. That is, given the photo (or video recording) of a person, recognize who this person
is. In other words, the system needs to classify the faces into one of many categories (Alice, Bob,
Charlie...) or decide that it is an unknown face. A similar, yet conceptually quite different
problem is that of verification. Here the goal is to verify whether the person in question is who
he claims to be. Note that differently to before, this is now a yes/no question. To deal with
different lighting conditions, facial expressions, whether a person is wearing glasses, hairstyle,
etc., it is desirable to have a system which learns which features are relevant for identifying a
Person.
Other applications which take advantage of learning are speech recognition (annotate an audio
sequence with text, such as the system shipping with Microsoft Vista), the recognition of
handwriting (annotate a sequence of strokes with text, a feature common to many PDAs), track
pads of computers (e.g. Synaptic, a major manufacturer of such pads derives its name from the
synapses of a neural network), the detection of failure in jet engines, avatar behavior in computer
games (e.g. Black and White), direct marketing (companies use past purchase behavior to
guesstimate whether you might be willing to purchase even more) and cleaning robots (such as
iRobot's Roomba). The overarching theme of learning problems is that there exists a nontrivial
dependence between some observations, which we will commonly refer to as x and a desired
response, which we refer to as y, for which a simple set of deterministic rules is not known. By
using learning we can infer such a dependency between x and y in a systematic fashion.
3.2 Data
It is useful to characterize learning problems according to the type of data they use. This is a
great help when encountering new challenges, since quite often problems on similar data types
can be solved with very similar techniques. For instance natural language processing and
bioinformatics use very similar tools for strings of natural language text and for DNA sequences.
Vectors constitute the most basic entity we might encounter in our work. For instance, a life
insurance company might be interesting in obtaining the vector of variables (blood pressure,
heart rate, height, weight, cholesterol level, smoker, gender) to infer the life expectancy of a
potential customer. A farmer might be interested in determining the ripeness of fruit based on
(size, weight, and spectral data). An engineer might want to find dependencies in (voltage,
current) pairs. Likewise one might want to represent documents by a vector of counts which
describe the occurrence of words. The latter is commonly referred to as bag of words features.
One of the challenges in dealing with vectors is that the scales and units of different coordinates
may vary widely. For instance, we could measure the height in kilograms, pounds, grams, tons,
stones, all of which would amount to multiplicative changes. Likewise, when representing
15
temperatures, we have a full class of affine transformations, depending on whether we represent

them in terms of Celsius, Kelvin or Fahrenheit. One way of dealing with those issues in an
automatic fashion is to normalize the data. We will discuss means of doing so in an automatic
fashion.
Lists: In some cases the vectors we obtain may contain a variable number of features. For
instance, a physician might not necessarily decide to perform a full battery of diagnostic tests if
the patient appears to be healthy.
Sets may appear in learning problems whenever there is a large number of potential causes of an
effect, which are not well determined. For instance, it is relatively easy to obtain data concerning
the toxicity of mushrooms. It would be desirable to use such data to infer the toxicity of a new
mushroom given information about its chemical compounds. However, mushrooms contain a
cocktail of compounds out of which one or more may be toxic. Consequently we need to infer
the properties of an object given a set of features, whose composition and number may vary
considerably.
Matrices are a convenient means of representing pairwise relationships. For instance, in

collaborative filtering applications the rows of the matrix may represent users whereas the
columns correspond to products. Only in some cases we will have knowledge about a given
(user, product) combination, such as the rating of the product by a user.
A related situation occurs whenever we only have similarity information between observations,
as implemented by a semi-empirical distance measure. Some homology searches in
bioinformatics, e.g. variants of BLAST only return a similarity score which does not necessarily
satisfy the requirements of a metric.
Images could be thought of as two dimensional arrays of numbers, that is, matrices. This
representation is very crude, though, since they exhibit spatial coherence (lines, shapes) and
(natural images exhibit) a multi resolution structure. That is, down sampling an image leads to an
object which has very similar statistics to the original image. Computer vision and psych optics
have created a raft of tools for describing these phenomena.
Video adds a temporal dimension to images. Again, we could represent them as a three
dimensional array. Good algorithms, however, take the temporal coherence of the image
sequence into account.
Trees and Graphs are often used to describe relations between collections of objects. For
instance the ontology of webpages of the DMOZ project has the form of a tree with topics
becoming increasingly refined as we traverse from the root to one of the leaves. In the case of
gene ontology the relationships form a directed acyclic graph, also referred to as the
GO-DAG.
16
3.3 BASIC ALGORITHMS

We conclude our introduction to machine learning by discussing four simple algorithms, namely
Naive Bayes, Nearest Neighbors, the Mean Classier, and the Perceptron, which can be used to
solve a binary classification problem. We will also introduce the K-means algorithm which can
be employed when labeled data is not available. All these algorithms are readily usable and
easily implemented from scratch in their most basic form.
For the sake of concreteness assume that we are interested in spamming iterating.
That is, we are given a set of m e-mails xi, denoted by X := fx1; : : : ; xmg and associated labels
yi, denoted by Y := fy1; : : : ; ymg. Here the labels satisfy yi 2 fspam; hamg. The key assumption
we make here is that the pairs (xi; yi) are drawn jointly from some distribution p(x; y) which
represents the e-mail generating process for a user. Moreover, we assume that there is
sufficiently strong dependence between x and y that we will be able to estimate y given x and a
set of labeled instances X; Y. One way of converting text into a vector is by using the so-called
bag of words representation [Mar61, Lew98]. In its simplest version it works as follows: Assume
we have a list of all possible words occurring in X, that is a dictionary, then we are able to assign
a unique number with each of those words (e.g. the position in the dictionary). Now we may
simply count for each document xi the number of times a given word j is occurring. This is then
used as the value of the j-th coordinate of xi. Once we have the latter it is easy to compute
distances, similarities, and other statistics directly from the vectorial representation.
Various types of algorithms are:-
1. Naïve Bayes.
2. Nearest Neighbor Estimators.
3. A Simple Classifier.
4. Perceptron.
5. K-Means.
6. J48.
17
CHAPTER - 4
CLASSIFICATION
4.1 INTRODUCTION
Classification is the task of learning a target function f that maps each attribute set x to one of
the predefined class labels y. The target function is also known informally as a classification
model.
A classification model is useful for the following purposes:-

Descriptive Modeling - A classification model can serve as an explanatory tool to distinguish
between objects of different classes. For example, it would be useful—for both biologists and
others—to have a descriptive model that summarizes the data of different datasets of vertebrates.
Predictive Modeling - A classification model can also be used to predict the class label of
unknown records. A classification model can be treated as a black box that automatically assigns
a class label when presented with the attribute set of an unknown record.
Classification techniques are most suited for predicting or describing data sets with binary or
nominal categories. They are less effective for ordinal categories (e.g., to classify a person as a
member of high-, medium-, or low income group) because they do not consider the implicit
order among the categories. Other forms of relationships, such as the subclass–superclass
relationships among categories (e.g., humans and apes are primates, which in turn, is a subclass
of mammals) are also ignored. The remainder of this chapter focuses only on binary or nominal
class labels.
4.2 Solution to a Classification Problem
A classification technique (or classifier) is a systematic approach to building classification

models from an input data set. Examples include decision tree classifiers, rule-based classifiers,
neural networks, support vector machines, and naive Bayes classifiers. Each technique employs a
learning algorithm to identify a model that best fits the relationship between the attribute set
and class label of the input data. The model generated by a learning algorithm should both fit the
input data well and correctly predict the class labels of records it has never seen before.
Therefore, a key objective of the learning algorithm is to build models with good generalization
capability; i.e., models that accurately predict the class labels of previously unknown records.
First, a training set consisting of records whose class labels are known must be provided. The
training set is used to build a classification model, which is subsequently applied to the test set,
which consists of records with unknown class labels.
18
Evaluation of the performance of a classification model is based on the counts of test records
correctly and incorrectly predicted by the model. These counts are tabulated in a table known as
a confusion matrix. Table 3.1 depicts the confusion matrix for a binary classification problem.
Each entry fij in this table denotes the number of records from class i predicted to be of class j.
For instance, f01 is the number of records from class 0 incorrectly predicted as class 1. Based on
the entries in the confusion matrix, the total number of correct predictions made by the model is
(f11 + f00) and the total number of incorrect predictions is (f10 + f01).
Table 4.1 Confusion matrix for a 2-class problem

Predicted class
Class1 Class2
Class=1 f11 f10

Actual
Class Class=0 f01 f00
Although a confusion matrix provides the information needed to determine how well a
classification model performs, summarizing this information with a single number would make it
more convenient to compare the performance of different models. This can be done using a
performance metric such as accuracy, which is defined as follows:
Accuracy = Number of correct predictions = f11 + f00

Total number of predictions f11 + f10 + f01 + f00
Equivalently, the performance of a model can be expressed in terms of its error rate, which is
given by the following equation:
Error Rate = Number of wrong predictions = f01 + f10

Total number of predictions f11 + f10 + f01 + f00
Most classification algorithms seek models that attain the highest accuracy, or equivalently, the
lowest error rate when applied to the test set.
4.3 Decision Tree
To illustrate how classification with a decision tree works, consider a simpler version of the
vertebrate classification problem. Instead of classifying the vertebrates into five distinct groups
19
of species, we assign them to two categories: mammals and non-mammals. Suppose a new
species is discovered by scientists. How can we tell whether it is a mammal or a non-mammal?
One approach is to pose a series of questions about the characteristics of the species. The first
question we may ask is whether the species is cold- or warm-blooded. If it is cold-blooded, then
it is definitely not a mammal. Otherwise, it is either a bird or a mammal. In the latter case, we
need to ask a follow-up question: Do the females of the species give birth to their young? Those
that do give birth are definitely mammals, while those that do not are likely to be non-mammals
(with the exception of egg-laying mammals such as the platypus and spiny anteater).
The previous example illustrates how we can solve a classification problem by asking a series of
carefully crafted questions about the attributes of the test record. Each time we receive an
answer, a follow-up question is asked until we reach a conclusion about the class label of the
record. The series of questions and their possible answers can be organized in the form of a
decision tree, which is a hierarchical structure consisting of nodes and directed edges.
The tree has three types of nodes:

• A root node that has no incoming edges and zero or more outgoing edges.
• Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.
• Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges.
In a decision tree, each leaf node is assigned a class label. The nonterminal nodes, which
include the root and other internal nodes, contain attribute test conditions to separate records that
have different characteristics. Classifying a test record is straightforward once a decision tree has
been constructed. Starting from the root node, we apply the test condition to the record and
follow the appropriate branch based on the outcome of the test. This will lead us either to another
internal node, for which a new test condition is applied, or to a leaf node. The class label
associated with the leaf node is then assigned to the record.
4.3.1 Design Issues of Decision Tree
A learning algorithm for inducing decision trees must address the following two issues.
1. How should the training records be split? Each recursive step of the tree-growing
process must select an attribute test condition to divide the records into smaller subsets.
To implement this step, the algorithm must provide a method for specifying the test
condition for different attribute types as well as an objective measure for evaluating the
goodness of each test condition.
2. How should the splitting procedure stop? A stopping condition is needed to terminate
the tree-growing process. A possible strategy is to continue expanding a node until either
all the records belong to the same class or all the records have identical attribute values.
Although both conditions are sufficient to stop any decision tree induction algorithm,
other criteria can be imposed to allow the tree-growing procedure to terminate earlier.
20
4.4 Methods for Expressing Attribute Test Conditions
Decision tree induction algorithms must provide a method for expressing an attribute test
condition and its corresponding outcomes for different attribute types.
Binary Attributes The test condition for a binary attribute generates two potential outcomes.
Nominal Attributes Since a nominal attribute can have many values, its test condition can be
expressed in two ways. For a multi way split, the number of outcomes depends on the number of
distinct values for the corresponding attribute. For example, if an attribute such as marital status
has three distinct values—single, married, or divorced—its test condition will produce a three-
way split. On the other hand, some decision tree algorithms, such as CART, produce only binary
splits by considering all 2k−1 − 1 ways of creating a binary partition of k attribute values.
Ordinal Attributes Ordinal attributes can also produce binary or multi way splits. Ordinal
attribute values can be grouped as long as the grouping does not violate the order property of the
attribute values.
Continuous Attributes For continuous attributes, the test condition can be expressed as a
comparison test (A < v) or (A ≥ v) with binary outcomes, or a range query with outcomes of the
form vi ≤ A < vi+1, for i = 1, . . . , k. For the binary case, the decision tree algorithm must
consider all possible split positions v, and it selects the one that produces the best partition. For
the multi way split, the algorithm must consider all possible ranges of continuous values. After
discretization, a new ordinal value will be assigned to each discretized interval. Adjacent
intervals can also be aggregated into wider ranges as long as the order property is preserved.
21
CHAPTER - 5
CLUSTERING
5.1 Introduction to clustering
Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If
meaningful groups are the goal, then the clusters should capture the natural structure of the data.
In some cases, however, cluster analysis is only a useful starting point for other purposes, such as
data summarization. Whether for understanding or utility, cluster analysis has long played an
important role in a wide variety of fields: psychology and other social sciences, biology,
statistics, pattern recognition, information retrieval, machine learning, and data mining.
There have been many applications of cluster analysis to practical problems. We provide some
specific examples, organized by whether the purpose of the clustering is understanding or utility.
Clustering for Understanding Classes, or conceptually meaningful groups of objects that share
common characteristics, play an important role in how people analyze and describe the world.
Indeed, human beings are skilled at dividing objects into groups (clustering) and assigning
particular objects to these groups (classification). For example, even relatively young children
can quickly label the objects in a photograph as buildings, vehicles, people, animals, plants, etc.
In the context of understanding data, clusters are potential classes and cluster analysis is the
study of techniques for automatically finding classes. The following are some examples:
• Biology. Biologists have spent many years creating a taxonomy (hierarchical classification) of
all living things: kingdom, phylum, class, order, family, genus, and species. Thus, it is perhaps
not surprising that much of the early work in cluster analysis sought to create a discipline of
mathematical taxonomy that could automatically find such classification structures. More
recently, biologists have applied clustering to analyze the large amounts of genetic information
that are now available. For example, clustering has been used to find groups of genes that have
similar functions.
• Information Retrieval. The World Wide Web consists of billions of Web pages, and the
results of a query to a search engine can return thousands of pages. Clustering can be used to
group these search results into a small number of clusters, each of which captures a particular
aspect of the query. For instance, a query of “movie” might return Web pages grouped into
categories such as reviews, trailers, stars, and theaters. Each category (cluster) can be broken into
subcategories (sub clusters), producing a hierarchical structure that further assists a user’s
exploration of the query results.
22
• Climate. Understanding the Earth’s climate requires finding patterns in the atmosphere and
ocean. To that end, cluster analysis has been applied to find patterns in the atmospheric pressure
of Polar Regions and areas of the ocean that have a significant impact on land climate.
• Psychology and Medicine. An illness or condition frequently has a number of variations, and
cluster analysis can be used to identify these different subcategories. For example, clustering has
been used to identify different types of depression. Cluster analysis can also be used to detect
patterns in the spatial or temporal distribution of a disease.
• Business. Businesses collect large amounts of information on current and potential customers.
Clustering can be used to segment customers into a small number of groups for additional
analysis and marketing activities.
Clustering for Utility Cluster analysis provides an abstraction from individual data objects to
the clusters in which those data objects reside. Additionally, some clustering techniques
characterize each cluster in terms of a cluster prototype; i.e., a data object that is representative
of the other objects in the cluster. These cluster prototypes can be used as the basis for a number
of data analysis or data processing techniques. Therefore, in the context of utility, cluster
analysis is the study of techniques for finding the most representative cluster prototypes.
• Summarization. Many data analysis techniques, such as regression or PCA, have a time or
space complexity of O (m2) or higher (where m is the number of objects), and thus, are not
practical for large data sets. However, instead of applying the algorithm to the entire data set, it
can be applied to a reduced data set consisting only of cluster prototypes. Depending on the type
of analysis, the number of prototypes, and the accuracy with which the prototypes represent the
data, the results can be comparable to those that would have been obtained if all the data could
have been used.
• Compression. Cluster prototypes can also be used for data compression. In particular, a table is
created that consists of the prototypes for each cluster; i.e., each prototype is assigned an integer
value that is its position (index) in the table. Each object is represented by the index of the
prototype associated with its cluster. This type of compression is known as vector quantization
and is often applied to image, sound, and video data, where (1) many of the data objects are
highly similar to one another, (2) some loss of information is acceptable, and (3) a substantial
reduction in the data size is desired.
• Efficiently Finding Nearest Neighbors. Finding nearest neighbors can require computing the
pairwise distance between all points. Often clusters and their cluster prototypes can be found
much more efficiently. If objects are relatively close to the prototype of their cluster, then we can
use the prototypes to reduce the number of distance computations that are necessary to find the
nearest neighbors of an object. Intuitively, if two cluster prototypes are far apart, then the objects
in the corresponding clusters cannot be nearest neighbors of each other. Consequently, to find an
object’s nearest neighbors it is only necessary to compute the distance to objects in nearby
clusters, where the nearness of two clusters is measured by the distance between their prototypes.
23
5.2 Cluster Analysis

Cluster analysis groups data objects based only on information found in the data that describes
the objects and their relationships. The goal is that the objects within a group be similar (or
related) to one another and different from (or unrelated to) the objects in other groups. The
greater the similarity (or homogeneity) within a group and the greater the difference between
groups, the better or more distinct the clustering.
Cluster analysis is related to other techniques that are used to divide data objects into groups. For
instance, clustering can be regarded as a form of classification in that it creates a labeling of
objects with class (cluster) labels.
However, it derives these labels only from the data. In contrast, classification in the sense of
supervised classification; i.e., new, unlabeled objects are assigned a class label using a model
developed from objects with known class labels. For this reason, cluster analysis is sometimes
referred to as unsupervised classification. When the term classification is used without any
qualification within data mining, it typically refers to supervised classification.
Also, while the terms segmentation and partitioning are sometimes used as synonyms for
clustering, these terms are frequently used for approaches outside the traditional bounds of
cluster analysis. For example, the term partitioning is often used in connection with techniques
that divide graphs into sub graphs and that are not strongly connected to clustering.
Segmentation often refers to the division of data into groups using simple techniques; e.g., an
image can be split into segments based only on pixel intensity and color, or people can be
divided into groups based on their income. Nonetheless, some work in graph partitioning and in
image and market segmentation is related to cluster analysis.
5.3 Types of Clustering

An entire collection of clusters is commonly referred to as a clustering, and in this section, we
distinguish various types of clustering: hierarchical (nested) versus partitional (unnested),
exclusive versus overlapping versus fuzzy, and complete versus partial.
Hierarchical versus Partitional: The most commonly discussed distinction among different
types of clustering is whether the set of clusters is nested or unnested, or in more traditional
terminology, hierarchical or partitional. A partitional clustering is simply a division of the set
of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one
subset.
If we permit clusters to have sub clusters, then we obtain a hierarchical clustering, which is a
set of nested clusters that are organized as a tree. Each node (cluster) in the tree (except for the
leaf nodes) is the union of its children (sub clusters), and the root of the tree is the cluster
containing all the objects. Often, but not always, the leaves of the tree are singleton clusters of
individual data objects.
Exclusive versus Overlapping versus Fuzzy: There are many situations in which a point could
reasonably be placed in more than one cluster, and these situations are better addressed by non-
exclusive clustering. In the most general sense, an overlapping or non-exclusive clustering is
used to reflect the fact that an object can simultaneously belong to more than one group (class).
24
For instance, a person at a university can be both an enrolled student and an employee of the
university. A non-exclusive clustering is also often used when, for example, an object is
“between” two or more clusters and could reasonably be assigned to any of these clusters.
In a fuzzy clustering, every object belongs to every cluster with a membership weight that is
between 0 (absolutely doesn’t belong) and 1 (absolutely belongs). In other words, clusters are
treated as fuzzy sets. (Mathematically, a fuzzy set is one in which an object belongs to any set
with a weight that is between 0 and 1. In fuzzy clustering, we often impose the additional
constraint that the sum of the weights for each object must equal 1.) Similarly, probabilistic
clustering techniques compute the probability with which each point belongs to each cluster, and
these probabilities must also sum to 1. Because the membership weights or probabilities for any
object sum to 1, a fuzzy or probabilistic clustering does not address true multiclass situations,
such as the case of a student employee, where an object belongs to multiple classes.
Instead, these approaches are most appropriate for avoiding the arbitrariness of assigning an
object to only one cluster when it may be close to several. In practice, a fuzzy or probabilistic
clustering is often converted to an exclusive clustering by assigning each object to the cluster in
which its membership weight or probability is highest.
Complete versus Partial: A complete clustering assigns every object to a cluster, whereas a
partial clustering does not. The motivation for a partial clustering is that some objects in a data
set may not belong to well-defined groups. Many times objects in the data set may represent
noise, outliers, or “uninteresting background.” For example, some newspaper stories may share a
common theme, such as global warming, while other stories are more generic or one-of-a-kind.
Thus, to find the important topics in last month’s stories, we may want to search only for clusters
of documents that are tightly related by a common theme. In other cases, a complete clustering of
the objects is desired. For example, an application that uses clustering to organize documents for
browsing needs to guarantee that all documents can be browsed.
5.4 Clustering Techniques

The following three simple, but important techniques to introduce many of the concepts involved
in cluster analysis.
• K-means. This is a prototype-based, partitional clustering technique that attempts to find a

user-specified number of clusters (K), which are represented by their centroids.
• Agglomerative Hierarchical Clustering. This clustering approach refers to a collection of

closely related clustering techniques that produce a hierarchical clustering by starting with each
point as a singleton cluster and then repeatedly merging the two closest clusters until a single,
all-encompassing cluster remains. Some of these techniques have a natural interpretation in
terms of graph-based clustering, while others have an interpretation in terms of a prototype-based
approach.
• DBSCAN. This is a density-based clustering algorithm that produces a partitional clustering, in

which the number of clusters is automatically determined by the algorithm. Points in low-density
25
regions are classified as noise and omitted; thus, DBSCAN does not produce a complete
clustering.
5.4.1 K-means
Prototype-based clustering techniques create a one-level partitioning of the data objects. There
are a number of such techniques, but two of the most prominent are K-means and K-medoid. K-
means defines a prototype in terms of a centroid, which is usually the mean of a group of points,
and is typically applied to objects in a continuous n-dimensional space.
K-medoid defines a prototype in terms of a medoid, which is the most representative point for a
group of points, and can be applied to a wide range of data since it requires only a proximity
measure for a pair of objects. While a centroid almost never corresponds to an actual data point,
a medoid, by its definition, must be an actual data point. In this section, we will focus solely on
K-means, which is one of the oldest and most widely used clustering algorithms.
5.4.1.1 The Basic K-means Algorithm

The K-means clustering technique is simple, and we begin with a description of the basic
algorithm. We first choose K initial centroids, where K is a user specified parameter, namely, the
number of clusters desired. Each point is then assigned to the closest centroid, and each
collection of points assigned to a centroid is a cluster. The centroid of each cluster is then
updated based on the points assigned to the cluster. We repeat the assignment and update steps
until no point changes clusters, or equivalently, until the centroids remain the same.
K-means is formally described by Algorithm 4.1. In the figures displaying K-means clustering,
each subfigure shows (1) the centroids at the start of the iteration and (2) the assignment of the
points to those centroids. The centroids are indicated by the “+” symbol; all points belonging to
the same cluster have the same marker shape.
Algorithm 5.1: Basic K-means algorithm.
1: Select K points as initial centroids.

2: repeat
3: Form K clusters by assigning each point to its closest centroid.
4: Re-compute the centroid of each cluster.
5: until Centroids do not change.
26
The points are assigned to the initial centroids, which are all in the larger group of points. For
this example, we use the mean as the centroid. After points are assigned to a centroid, the
centroid is then updated. Again, the figure for each step shows the centroid at the beginning of
the step and the assignment of points to those centroids. In the second step, points are assigned to
the updated centroids, and the centroids are updated again. When the K-means algorithm
terminates no more changes occur, the centroids have identified the natural groupings of points.
For some combinations of proximity functions and types of centroids, K-means always
converges to a solution; i.e., K-means reaches a state in which no points are shifting from one
cluster to another, and hence, the centroids don’t change. Because most of the convergence
occurs in the early steps, however, the condition on line 5 of Algorithm 8.1 is often replaced by a
weaker condition, e.g., repeat until only 1% of the points change clusters.
27
CHAPTER - 6
WEKA
6.1 Introduction to WEKA Tool

WEKA stands for Waikato Environment for Knowledge Analysis. It’s a data mining/machine
learning tool developed by Department of Computer Science, University of Waikato, New
Zealand. Weka is also a bird found only on the islands of New Zealand. WEKA can easily be
downloaded from the website http://www.cs.waikato.ac.nz/~ml/weka/index.html . It also
supports multiple platforms like Windows, Mac OS X and Linux. It is written in java so it is easy
to run on different platforms.
Characteristics of WEKA are:
 It has 49 data preprocessing tools
 It contains 76 classification/regression algorithms
 8 clustering algorithms
 3 algorithms for finding association rules
 15 attribute/subset evaluators + 10 search algorithms for feature selection
WEKA has Graphical User Interfaces which are used to do all the processing work. The three
graphical user interfaces are:
 “The Explorer” (exploratory data analysis)
 “The Experimenter” (experimental environment)
 “The Knowledge Flow” (new process model inspired interface)
6.2 Explorer: pre-processing the data

 Data can be imported from a file in various formats like ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from an SQL database (using JDBC)
 Pre-processing tools in WEKA are called “filters”
 WEKA contains filters for:
 Discretization, normalization, resampling, attribute selection, transforming and
combining attributes, …
28
Figure 6.1: WEKA Explorer
6.2.1 Explorer: building “classifiers”

 Classifiers in WEKA are models for predicting nominal or numeric quantities
 Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support vector machines, multi-layer
perceptron’s, logistic regression, Bayes’ nets, etc.
Figure 6.2: Classify tab of WEKA Explorer

29
6.2.2 Explorer: clustering data
 WEKA contains “cluster” for finding groups of similar instances in a dataset

 Implemented schemes are:
o k-Means, EM, Cobweb, X-means, Farthest First
 Clusters can be visualized and compared to “true” clusters (if given)
 Evaluation based on log likelihood if clustering scheme produces a probability
distribution.
Figure 6.3: Cluster tab of WEKA Explorer
K-Means Clustering Method

o Partition objects into k subsets
o Compute seed points as the centroids
o Assign each object to the cluster with the nearest seed point
o Go back to Step 2, stop when no more new assignment
30
Figure 6.4: Applying K-Means in Clustering
6.3 Visualization
Figure 6.5: Visualization Tab in WEKA
This panel helps us understand the relationship between different attributes and their values of
the data instances by providing a visual representation in the form of different graphs plotted
with different attributes set as X and Y axis.
31
CHAPTER – 7
RESULTS & DISCUSSIONS
7.1 OUTCOME
Figure 7.1 Applying classification in Vertebral Column 2C Data Set ZeroR
Here using ZeroR classifier we got the accuracy to be 67.7419%.
The confusion Matrix:
 Classified
A b as
210 0 a = Abnormal
100 0 B = Normal
32
Figure 7.2 Applying classification in Vertebral Column 2C Data Set RepTree
Here using RepTree classifier we got the accuracy to be 80.3226%.
 Classified
A B as
181 29 a = Abnormal
32 68 b= Normal
33
Figure 7.3 Applying classification in Vertebral Column 2C Data Set J48
Here using J48 classifier we got the accuracy to be 81.6129%.
A b  Classified as
189 21 a = Abnormal
36 64 b = Normal
34
Figure 7.4 Applying classification in Vertebral Column 2C Data Set OneR
Here using OneR classifier we got the accuracy to be 73.871%.
A b  Classified as
161 49 a = Abnormal
32 68 b = Normal
35
Figure 7.5 Applying classification in Vertebral Column 2C Data Set Naïve Bayes
Here using Naïve Bayes classifier we got the accuracy to be 77.7419%.
a B  Classified as
154 56 a = Abnormal
13 87 b = Normal
Here it is evident that the J48 classifier produces the most accurate result. Hence
we can use J48 for future use in machine learning for better and more precise
results.
36
7.2 Pre-Poll Survey Dataset
This is another dataset that we are using. This dataset is of 2012 U.S. Presidential Election.
Here is the snapshot of the Pre-Poll Survey dataset that we have used. It’s in comma separated
values (CSV) format.
Figure 7.6 Starting point of the Pre-Poll Survey dataset.
Figure 7.7 Another snapshot of the Pre-Poll Survey dataset.

37
Figure 7.8 Attributes and Instances in the dataset as shown by WEKA tool.
Here Figure 7.8 displays the attributes and the instances that are displayed by the WEKA tool when the dataset
is given as an input.
Figure 7.9 The two main class of the dataset represented by Democrat & Republican
In Fig7.9 the Blue column represents the Democrat and the Red column represents the Republican.
38
7.2.2 Understanding the visual output of the instances given by

WEKA
Now we will look at some of the instances as displayed by WEKA tool and try to understand the output
returned on selecting the individual instances.
Figure 7.10 This is the water project cost sharing instance which shows that equal number of yes and no.
Figure 7.11 This is the adoption of budget instance which shows that democratic won.
39
Figure 7.12 This is the anti-satellite test instance which shows that democratic won.
Figure 7.13 This is the education spending instance which shows that republic won.
40
Figure 7.14 This is the crime instance which shows that republican won.
Figure 7.15 This is the duty free exports instance which shows that democratic won.
41
7.2.3 Applying Classification on the dataset
Figure 7.16 Applying ZeroR classifier on the dataset
Here using the ZeroR classifier we got the accuracy of 61.3793%.
Figure 7.17 Applying OneR classifier on the dataset.
Here using the OneR classifier we got the accuracy of 95.6322%.

42
Figure 7.18 Applying Naïve Bayes classifier on the dataset.
Here using the Naïve Bayes classifier we got the accuracy of 90.1149%.
Figure 7.19 Applying REPTree classifier on the dataset.
Here using the REPTree classifier we got the accuracy of 95.4023%.

43
Figure 7.20 Applying J48 classifier on the dataset.
Here using the J48 classifier we got the accuracy of 96.3218%.
So by comparing the various Classification algorithms that we have applied on the Pre-Poll dataset we can see
that the J48 classifier gives the most accurate results and showing that there was an error of 3.6782%.
From the confusion matrix we can interpret that the number of instances under democrat are 259 and 8 are
incorrect. Similarly there are 160 instances under republican and 8 instances are incorrect.
While using various classification algorithms, we have used 10-fold cross validations. Cross Validation is a
standard evaluation technique which divides the dataset into 10 pieces or folds. It then holds out each fold turn
wise for testing and training the rest of the 9 folds. This process in return gives 10 results, which are then
averaged and a single result is presented.
Some other factors that we get from the classifier output are:
1. TP Rate - These are the instances which are correctly classified in the class.
2. FP Rate - These are the instances which are falsely classified in the class.
3. Precision - It is the proportion of the instances that are actually of the class divided by the total
instances that have been classified as the class.
4. Recall - It is the proportion of the instances that are classified as given class divided by the actual total
number of instances in that class. Recall is equal to the TP Rate.
5. F-Measure - It is the combined measure for precision and recall. It is calculated as 2 * Precision *
Recall / (Precision + Recall).
6. Kappa Statistics - It is a change corrected measure of the agreement between the classification and
the true classes.
44
7.2.4 Applying Clustering on the dataset

We have used Simple K-Means algorithm for applying clustering on the dataset.
Figure 7.21 Applying Simple K-Means on the dataset part-1.
Figure 7.22 Applying Simple K-Means on the dataset part-2.

45
After applying the algorithm it was seen that two clusters were formed i.e. Cluster-0 and Cluster-1.
Cluster-0 is the republican and Cluster-1 is the Democrat.
The Cluster-0 has 214 instances which is 49% of the total available instances.
The Cluster-1 has 221 instances which is 51% of the total available instances.
In the real voting conducted in 2012, the Democrats got 51.1% of the total votes. Our model also predicted
51% of the total instances. Thereby showing that our analysis of the Pre-Poll Survey was almost same and
was a difference of 0.1% which is very high accuracy for such a large database.
7.2.5 Visualizing the Clustered assignments

We will now be visualizing the Clustered assignments and see for some of the instances and see how the
democrat and republican are competing in the pre-poll elections. We have kept the number of instances on X-
Axis and various other attributes on the Y-Axis.
In this graph the blue cross represents the democrat and the red cross represents the republican.
Here “y” represents “yes” and “n” represents “no”
Fig 7.23 Graph with Number of instances on X-Axis and water-project-cost-sharing instance on Y-Axis.
So we can see that there are almost equal number of red and blue cross in both y and n part of the y axis.
Therefore both democrat and republican got almost equal number of votes.
46
Fig 7.24 Graph with Number of instances on X-Axis and adoption-of-the-budget-resolution on Y-Axis.
So we can see that there are more blue crosses in “y” part of the y axis. Therefore democrat got more votes.
Fig 7.25 Graph with Number of instances on X-Axis and antisatellite-test-ban on Y-Axis.
47
Fig 7.26 Graph with Number of instances on X-Axis and education-spending on Y-Axis.
So we can see that there are more red crosses in “y” part of the y axis. Therefore republican got more votes.
Fig 7.27 Graph with Number of instances on X-Axis and crime on Y-Axis.
So we can see that there are more red crosses in “y” part of the y axis. Therefore republican got more votes.
48
Fig 7.28 Graph with Number of instances on X-Axis and duty-free-export on Y-Axis.
49
FUTURE WORK
● Application of more efficient Machine Learning Algorithms (Like Support Vector
Machines, etc.)
● Analyze data better and visualize the analyzed data in the form of more graphs.
● Use the gained experience and knowledge to use machine learning for using in
Artificial Neural Networking (ANN) for better predictive analysis.
7.3 SUMMARY
In this project we studied about Machine Learning and learned to apply various different
methods and understand their results and accuracy. For supervised machine learning we applied
the classification and for the unsupervised learning we applied the clustering method.
We also learned about the software WEKA and its different components which were used to
effectively study, analyze, visualize and fully understand different datasets. We mainly used the
Explorer component of WEKA to apply different classification and clustering algorithms.
For our project we have used medical datasets. All the datasets used in this project were taken
from the UCI repository .which includes:
 Vertebral Column 2C Data Set.
 Vertebral Column 3C Data Set.
 Mammographic-Masses Data Set.
 Cryotherapy Data Set.
 Haberman’s Survival Data Set.
 U.S. Presidential 2012 Pre-Poll Election
Similar work has been done on 15 more datasets from the same repository.
For classification we applied different models like J48, Rep-tree, Zero-R, One-R, Naïve bayes.
We observed that out of all classification models the J48 decision tree had the best prediction
accuracy. For clustering we used the simple K-means algorithm on the datasets.
50
7.4 CONCLUSION
After comparing the results of different models, finally we can conclude that for the used
medical datasets, J48 model produced the most accurate result for classification and for
clustering, simple K-means produced the most optimal result.
WEKA provided us with feasible means to effectively study and understand large datasets
without any explicit programming or extensive assistance. Which allowed us to understand the
essential principles and working of Machine Learning. To apply the knowledge and experience
that we gained by doing this project requires further study and research which can be done by
using Artificial Neural Networks.
51
7.5 REFERENCES
 WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

 WEKA Tutorial:
Machine Learning with WEKA: A presentation demonstrating all graphical user
interfaces (GUI) in Weka.
A presentation which explains how to use Weka for exploratory data mining.
 WEKA Data Mining Book:
[1] Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and
Techniques (Second Edition)
 WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page
 Others:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.
 https://archive.ics.uci.edu/ml/index.php
 https://en.wikipedia.org/wiki/Machine_learning
 [2] Shai Ben-David & Shai Shalev-Shwartz , ‘Understanding Machine Learning: From
theory to algorithms’

WEKA

Uploaded by

Copyright:

Available Formats

WEKA

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

WEKA

Uploaded by

Copyright:

Available Formats

1

ADITYA AGARWAL 1515003

SCHOOL OF COMPUTER ENGINEERING

KIIT Deemed to be University

ADITYA AGARWAL 1515003

(Dr. Ajay Kumar Jena)

Aditya Agarwal 1515003

Sagar Goel 1505502

Palak Thakur 1505043

Piyush Gangle 1515026

CHAPTER -1: Introduction

CHAPTER -3: Machine Learning

CHAPTER -6: WEKA

CHAPTER-7: Results & Discussion

TABLE 4.1: confusion matrix for a 2 class problem

The main objectives of WEKA are to:

1.3 SYSTEM REQUIREMENTS

To implement the software WEKA, we downloaded it from the website

3.1 Vision of Machine Learning

An equally ill-defined problem is that of automatic translation of documents. At one extreme,

temperatures, we have a full class of affine transformations, depending on whether we represent

Matrices are a convenient means of representing pairwise relationships. For instance, in

3.3 BASIC ALGORITHMS

Various types of algorithms are:-

A classification model is useful for the following purposes:-

4.2 Solution to a Classification Problem

A classification technique (or classifier) is a systematic approach to building classification

Table 4.1 Confusion matrix for a 2-class problem

Class=1 f11 f10

Class Class=0 f01 f00

Accuracy = Number of correct predictions = f11 + f00

Error Rate = Number of wrong predictions = f01 + f10

4.3 Decision Tree

The tree has three types of nodes:

4.3.1 Design Issues of Decision Tree

4.4 Methods for Expressing Attribute Test Conditions

5.1 Introduction to clustering

5.2 Cluster Analysis

5.3 Types of Clustering

5.4 Clustering Techniques

• K-means. This is a prototype-based, partitional clustering technique that attempts to find a

• Agglomerative Hierarchical Clustering. This clustering approach refers to a collection of

• DBSCAN. This is a density-based clustering algorithm that produces a partitional clustering, in

5.4.1.1 The Basic K-means Algorithm

Algorithm 5.1: Basic K-means algorithm.

1: Select K points as initial centroids.

6.1 Introduction to WEKA Tool

6.2 Explorer: pre-processing the data

Figure 6.1: WEKA Explorer

6.2.1 Explorer: building “classifiers”

Figure 6.2: Classify tab of WEKA Explorer

6.2.2 Explorer: clustering data

 WEKA contains “cluster” for finding groups of similar instances in a dataset

Figure 6.3: Cluster tab of WEKA Explorer

K-Means Clustering Method

Figure 6.4: Applying K-Means in Clustering

Figure 6.5: Visualization Tab in WEKA

Figure 7.1 Applying classification in Vertebral Column 2C Data Set ZeroR

Here using ZeroR classifier we got the accuracy to be 67.7419%.