0% found this document useful (0 votes)
3 views

ML Unit-4 Final 2024-25

This document covers clustering and ensemble methods in machine learning, detailing various clustering techniques such as K-means, hierarchical, and density-based clustering, along with their applications and advantages. It also introduces ensemble methods like bagging and boosting, explaining how they combine multiple models to improve predictive performance. Key algorithms discussed include Random Forest and AdaBoost, highlighting their roles in enhancing model accuracy and efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML Unit-4 Final 2024-25

This document covers clustering and ensemble methods in machine learning, detailing various clustering techniques such as K-means, hierarchical, and density-based clustering, along with their applications and advantages. It also introduces ensemble methods like bagging and boosting, explaining how they combine multiple models to improve predictive performance. Key algorithms discussed include Random Forest and AdaBoost, highlighting their roles in enhancing model accuracy and efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT-4

Syllabus: CLUSTERING AND ENSEMBLE METHODS: Introduction to clustering: K-means clustering,


Bisecting K-Means clustering, Ensemble Methods: bagging and boosting, Random forest and
AdaBoost algorithms and Bayesian learning algorithm.

Introduction to clustering:

Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group than those in
other groups. In simple words, the aim is to segregate groups with similar traits and assign them
into clusters.
 Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset.
 It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
colour, behaviour, etc., and divides them as per the presence and absence of those similar
patterns
 It is an unsupervised learning method, hence no supervision is provided to the algorithm,
and it deals with the unlabelled dataset.
 After applying this clustering technique, each cluster or group is provided with a cluster-
ID. ML system can use this id to simplify the processing of large and complex datasets.
The clustering technique is commonly used for statistical data analysis.
Example:

Here, the clustering technique has partitioned the entire data points into two clusters. The data
points within a cluster are similar to each other but different from other clusters. For example, when
we visit any shopping mall, we can observe that the things with similar usage are grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can
easily find out the things. The clustering technique also works in the same way

Why Clustering?
Clustering allows us to find hidden relationship between the data points in the dataset.
Examples:
1. In marketing, customers are segmented according to similarities to carry out targeted marketing.
2. Given a collection of text, we need to organize them, according to the content similarities to create
a topic hierarchy
3. Detecting distinct kinds of pattern in image data (Image processing). It’s effective in biology
research for identifying the underlying patterns.
And there are many examples which makes clustering so important.

1
Types of Clustering Methods:

Partitioning or Flat algorithm: This algorithm try to divide the dataset of interest into predefined
number of groups/ clusters. All the groups/ clusters are independent of each other. For Example: K-
means, Fuzzy C- means.

K-Means Clustering algorithm : In this type, the dataset is divided into a set of k groups, where K is
used to define the number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another cluster
centroid.in below figure k value is three so that the given data set divided into three clusters.

Fuzzy C-means clustering algorithm: Fuzzy clustering is a type of portioning method and same as
k-means algorithm but the difference is in which a data points
may belong to more than one group or cluster. Each dataset has a set of membership coefficients,
which depend on the degree of membership to be in a cluster. It is sometimes also known as the
Fuzzy k-means algorithm.in below figure the given data set divided into two clusters as the C value is
two but some data points belong to both the clusters.

Hierarchical Clustering :
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset is
divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level. The
most common examples of this method is the Agglomerative Hierarchical algorithm, Divisive
Method.

Agglomerative Hierarchical algorithm:


 Bottom up approach
 Begin with each element as a separate cluster merge them into successively larger
clusters
Example:

2
Divisive Method:
 Top down approach
 Begin with the whole set and proceed to divide it into successively smaller clusters
Example:

Density-Based Clustering
The density-based clustering method connects the highly dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and connects the areas of high
densities into clusters. The dense areas in data space are divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has varying densities
and high dimensions. Popular examples of density models are DBSCAN and OPTICS.
• Discovers clusters of arbitrary shapes
• Handle noise

Distribution Model-Based Clustering:In the distribution model-based clustering method, the data
is divided based on the probability of how a dataset belongs to a particular distribution. The
grouping is done by assuming some distributions commonly Gaussian distribution. The example of
this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture
Models (GMM).
These clustering models are based on the notion of how probable is it that all data points in the
cluster belong to the same distribution

3
K Means Clustering:
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way
that each dataset belongs only one group that has similar properties.
 K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to
be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
 It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of
this algorithm is to minimize the sum of distances between the data point and their corresponding
clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters,
and repeats the process until it does not find the best clusters. The value of k should be
predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points, which are near to the
particular k-center, create a cluster.

here below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work? :


1. Select the number K to decide the number of clusters.
2. Select initial random K points or centroids or cluster centers.
3. Assign observations to closest cluster center
4. Revise cluster centers as mean of assigned observations
5. Repeat 3&4 until convergence

4
Let's understand the above steps by considering the example:
Initialisation: here consider k value is two.so, firstly; you need to randomly initialise two points
called the cluster centroids. Here, you need to make sure that your cluster centroids depicted by an
orange and blue cross as shown in the image are less than the training data points depicted by navy
blue dots. K-means clustering algorithm is an iterative algorithm and it follows next two steps
iteratively. Once you are done with the initialization, let’s move on to the next step.

Cluster Assignment: In this step, it will go through all the navy blue data points to compute the
distance between the data points and the cluster centroid initialised in the previous step. Now,
depending upon the minimum distance from the orange cluster centroid or blue cluster centroid, it
will group itself into that particular group. So, data points are divided into two groups, one
represented by orange colour and the other one in blue colour as shown in the graph. Since these
cluster formations are not the optimised clusters, so let’s move ahead and see how to get final
clusters.

Move Centroid: Now, you will take the above two cluster centroids and iteratively reposition them
for optimization. You will take all blue dots, compute their average and move current cluster
centroid to this new location. Similarly, you will move orange cluster centroid to the average of
orange data points. Therefore, the new cluster centroids will look as shown in the graph. Moving
forward, let us see how can we optimize clusters, which will give us better insight.

Optimization: You need to repeat above two steps iteratively till the cluster centroids stop changing their
positions and become static. Once the clusters become static, then k-means clustering algorithm is said
to be converged.

5
K Means Numerical Example with Illustration:
Given data

6
Advantages of k-means algorithm:
1. Ease of implementation
2. Speed
3. Availability
4. If variables are huge, then K-Means most of the times computationally faster than hierarchical
clustering, if we keep k smalls.
5. K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are
globular.

Disadvantages of k-means algorithm:


1. Selection of optimal number of clusters is difficult
2. Selection of the initial centroids is random
7
3. Makes hard assignments of points to clusters
4. Not a probabilistic model
5. K-means often does not work when clusters are not circular shaped and overlapping or are
unequal

Applications of K-Means Clustering:


Academic Performance:Based on the scores, students are categorized into grades like A, B, or C.
Diagnostic systems:The medical profession uses k-means in creating smarter medical decision
support systems, especially in the treatment of liver ailments.
Search engines: Clustering forms a backbone of search engines. When a search is performed, the
search results need to be grouped, and the search engines very often use clustering to do this.
Wireless sensor networks: The clustering algorithm plays the role of finding the cluster heads,
which collects all the data in its respective cluster.
document classification
Cluster documents in multiple categories based on tags, topics, and the content of the document.
This is a very standard classification problem and k-means is a highly suitable algorithm for this
purpose. The initial processing of the documents is needed to represent each document as a vector
and uses term frequency to identify commonly used terms that help classify the document. the
document vectors are then clustered to help identify similarity in document groups.
Identifying crime localities: with data related to crimes available in specific localities in a city, the
category of crime, the area of the crime, and the association between the two can give quality insight
into crime-prone areas within a city or a locality.
Customer segmentation: Clustering helps marketers improve their customer base, work on target
areas, and segment customers based on purchase history, interests, or activity monitoring. here is a
white paper on how telecom providers can cluster pre-paid customers to identify patterns in terms
of money spent in recharging, sending sms, and browsing the internet. the classification would help
the company target specific clusters of customers for specific campaigns.
Insurance fraud detection: Machine learning has a critical role to play in fraud detection and has
numerous applications in automobile, healthcare, and insurance fraud detection. Utilizing past
historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to
clusters that indicate fraudulent patterns. Since insurance fraud can potentially have a multi-million
dollar impact on a company, the ability to detect frauds is crucial.
Cyber-profiling criminals: Cyber-profiling is the process of collecting data from individuals and
groups to identify significant co-relations. The idea of cyber profiling is derived from criminal
profiles, which provide information on the investigation division to classify the types of criminals
who were at the crime scene.
Delivery store optimization: Optimize the process of good delivery using truck drones by using a
combination of k-means to find the optimal number of launch locations and a genetic algorithm to
solve the truck route as a traveling salesman problem.

Bisecting K-means
– Has less trouble with initialization because it performs several trial bisections and takes
the one with the lowest SSE, and because there are only two centroids at each step

8
The data consists of two pairs of clusters, where the clusters in each (top-bottom) pair are
closer to each other than to the clusters in the other pair. Figure shows that if we start with
two initial centroids per pair of clusters, then even when both centroids are in a single
cluster, the centroids will redistribute themselves so that the "true" clusters are found.

9
Ensemble Methods:
Ensemble methods is a machine learning technique that combines several base models in order to
produce one optimal predictive model.
 Ensemble learning is a method that is used to enhance the performance of Machine
Learning model by combining several learners. When compared to a single model, this
type of learning builds models with improved efficiency and accuracy.
 The main principle behind the ensemble model is that a group of weak learners come
together to form a strong learner, thus increasing the accuracy of the model.
 Ensemble methods can decrease variance using bagging approach, bias using a boosting
approach, or improve predictions using stacking approach.
 Ensemble training methods can either be homogenous or heterogeneous in nature. Most
ensemble learning methods are homogeneous, meaning that they use a single type of
base learning model/algorithm. In contrast, heterogeneous ensembles make use of
different learning algorithms, diversifying, and varying the learners to ensure that
accuracy is as high as possible.

Working:

Ensemble model can be built by using several base classifier or learning models let’s say
model1,model2,…….model N on given and then train all these models on given same training data set
and take all its predictions and combine all the predictions by using voting classifier to build final
prediction for our final model.
Here voting classifier is a Voting Classifier is a machine-learning model that trains on an ensemble of
numerous models and predicts an output (class) based on their highest probability of chosen class
as the output.

Types of ensemble methods:

In general, an ensemble model falls into one of two categories:

10
Parallel ensemble techniques Sequential ensemble techniques
Example: Bagging algorithms Example: Boosting algorithms
– Random forest – AdaBoost
– Bagging meta-estimator – GBM
– XGBM

Parallel ensemble techniques:

In parallel ensemble techniques, base learners are generated in a parallel format. Parallel methods
utilize the parallel generation of base learners to encourage independence between the base
learners. The independence of base learners significantly reduces the error by averaging the
predictions of the individual learners.
Example: Bagging.

BAGGING:
 Bagging is one of the Ensemble construction technique and a classification and regression
method aims to reduce the variance of estimates by averaging multiple estimates together.
Bagging creates subsets from the main dataset that the learners are trained on.
 Which is also known as Bootstrap Aggregation. Why because the bagging combines the
both the bootstrapping and aggregation methods to form one ensemble model.
 Example: random forest algorithm.

Bootstrapping: The bootstrap method refers to creating small multiple subsets of data from an
entire dataset. These subsets of data are randomly sampled and replaced. The replacement of the
sample is known as resampling or row sampling so there is a chance that the same data points are
selected. Each section of the subsets will have equal probability. It will change the mean and
standard deviation of the dataset, making the model more robust. The base learners and classifiers
in the ensemble method will be mapped onto these subsets.

Aggregation:
Aggregation for finding the average of all individual models predictions to get a final model
prediction.
Working process:
Now lets see working process of bagging .it can be work in three steps.

11
1. Boot strap data: which means here first we create subsets of training data set with random
samples by using boot strapping method.
2. Building multiple classifiers or models: after boot strap data build a model or classifier for each
subset of bootstrapped samples and then predict all the models.
3. Aggregation or model fit: in this step we are combine all predictions of the base models by using
aggregation method or voting classifier to get final prediction for our final model.

Advantages:
 Bagging is a completely data-specific algorithm. The bagging technique reduces model over-
fitting.
 It also performs well on high-dimensional data. Moreover, the missing values in the dataset
do not affect the performance of the algorithm.

Disadvantages:
 That being said, one limitation that it has is giving its final prediction based on the mean
predictions from the subset samples, rather than outputting the precise values for the
classification or regression model.
 Bagging is not helpful in case of bias or under-fitting in the data.
 Bagging ignores the value with the highest and the lowest result which may have a wide
difference and provides an average result.

Random Forest Algorithm:


 Random Forest is a popular machine-learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It is
based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
 As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
 The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.

12
The below diagram explains the working of the Random Forest algorithm:

Assumptions for Random Forest


Since the random forest combines multiple trees to predict the class of the dataset, it is possible that
some decision trees will predict the correct output, while others may not. However, together, all the
trees predict the correct output. Therefore, below are two assumptions for a better Random forest
classifier:
 There should be some actual values in the feature variable of the dataset so that the classifier
can predict accurate results rather than a guessed result.
 The predictions from each tree must have very low correlations.

Why use Random Forest?


Below are some points that explain why we should use the Random Forest algorithm:
 It takes less training time as compared to other algorithms.
 It predicts output with high accuracy, even for the large dataset it runs efficiently.
 It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?


Random Forest works in two-phase first is to create the random forest by combining N decision tree,
and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.


Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.

13
Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to
the Random forest classifier. The dataset is divided into subsets and given to each decision tree.
During the training phase, each decision tree produces a prediction result, and when a new data
point occurs, then based on the majority of results, the Random Forest classifier predicts the final
decision.

Consider the below image:

Applications of Random Forest


There are mainly four sectors where Random forest mostly used:
 Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
 Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
 Land Use: We can identify the areas of similar land use by this algorithm.
 Marketing: Marketing trends can be identified using this algorithm.

Example of Random Forest

Dataset:

14
Row sampling and Feature sampling:

Decision tree construction using subsets of data:

Prediction of unseen records using Majority voting:

Advantages of Random Forest


1. Random Forest is based on the bagging algorithm and uses Ensemble Learning technique. It
creates as many trees on the subset of the data and combines the output of all the trees. In this way,

15
it reduces overfitting problem in decision trees, also reduces the variance, and therefore improves
the accuracy.
2. Random Forest can be used to solve both classification as well as regression problems.
3. Random Forest works well with both categorical and continuous variables.
4. Random Forest can automatically handle missing values.
5. No feature scaling required: No feature scaling (standardization and normalization) required in case of
Random Forest as it uses rule-based approach instead of distance calculation.
6. Handles non-linear parameters efficiently: Nonlinear parameters do not affect the performance of a
Random Forest unlike curve-based algorithms. So, if there is high non-linearity between the independent
variables, Random Forest may outperform as compared to other curve-based algorithms.
7. Random Forest can automatically handle missing values.
8. Random Forest is usually robust to outliers and can handle them automatically. 9. Random Forest
algorithm is very stable. Even if a new data point is introduced in the dataset, the overall algorithm is not
affected much since the new data may impact one tree, but it is very hard for it to impact all the trees.
10. Random Forest is comparatively less impacted by noise.

Disadvantages of Random Forest


1. Complexity: Random Forest creates a lot of trees (unlike only one tree in case of decision tree) and
combines their outputs. By default, it creates 100 trees in Python sklearn library. To do so, this algorithm
requires much more computational power and resources. On the other hand decision tree is simple and
does not require so much computational resources.
2. Longer Training Period: Random Forest require much more time to train as compared to decision trees
as it generates a lot of trees (instead of one tree in case of decision tree) and makes decision on the
majority of votes.

Sequential ensemble techniques:


Sequential ensemble techniques generate base learners in a sequence. The sequential generation
of base learners promotes the dependence between the base learners. The performance of the model
is then improved by assigning higher weights to previously misrepresented or misclassified base
learners.
Example: Boosting.

BOOSTING:
• Boosting is a sequential ensemble model.
• Boosting is family of algorithms that converts weak learners into strong learners.
• Boosting is an ensemble technique that learns from previous predictor mistakes to make
better predictions in the future.
• The technique combines several weak base learners to form one strong learner, thus
significantly improving the predictability of models. Boosting works by arranging weak
learners in a sequence, such that weak learners learn from the next learner in the sequence
to create better predictive models
• Boosting is a sequential process in which more weightage is given to misclassified
instances after every iteration
• E .g. : AdaBoost, GradientBoost, LightGBM, Xgbm algorithm

16
How Boosting Algorithm Works?
1. The base learner takes all the distributions and assign equal weight or attention to each
observation.
2. If there is any prediction error caused by first base learning algorithm, then we pay higher
attention to observations having prediction error. Then, we apply the next base learning algorithm.
3. Repeat step 2 till the limit of the base learning algorithms is reached.
4. Finally, it combines the outputs from weak learner and creates a strong learner, which eventually
improves the prediction power of the model.

Advantages of Boosting
 It is one of the most successful techniques in solving the two-class classification problems.
 Boosting technique takes care of the weightage of the higher accuracy sample and lower
accuracy sample and then gives the combined results.
 Net error is evaluated in each learning steps. It works good with interactions.
 Boosting technique helps when we are dealing with bias or under fitting in the data set.
 Multiple boosting techniques are available. For example: AdaBoost, LPBoost, XGBoost,
Gradient Boost, Brown Boost.
 It is good at handling the missing data.
Disadvantages of Boosting
 Boosting is hard to implement in real-time due to the increased complexity of the algorithm.
 The high flexibility of these techniques results in a multiple numbers of parameters than
have a direct effect on the behaviour of the model.
 Boosting technique often ignores overfitting or variance issues in the data set.
 It increases the complexity of the classification.
 Time and computation can be a bit expensive.

Applications of Bagging and Boosting:


There are multiple areas where Bagging and boosting technique is used to boost the accuracy.
• Banking Sector
• Medical Data Predictions
• High-dimensional data
• Land cover mapping
• Fraud detection
• Network Intrusion Detection Systems
• Medical fields like neuroscience, prosthetics, etc.

17
• Credit risks
• Recommender system for Netflix
• Malware
• Wildlife conservations and so on.

Comparing Bagging and Boosting :

AdaBoost (Adaptive Boosting):


AdaBoost was the first successful boosting algorithm developed for the purpose of binary
classification. AdaBoost is short for Adaptive Boosting and is a very popular boosting technique that
combines multiple “weak classifiers” into a single “strong classifier”.
• AdaBoost, or Adaptive Boost, is a relatively new machine learning classification
algorithm. It is an ensemble algorithm that combines many weak learners (decision
trees) and turns it into one strong learner.
 The weak learners in AdaBoost take into account a single input feature and draw out a single
split decision tree called the decision stump. Each observation is weighed equally while
drawing out the first decision stump.
 The results from the first decision stump are analysed and if any observations are wrongfully
classified, they are assigned higher weights.
 Post this, a new decision stump is drawn by considering the observations with higher
weights as more significant.
 Again, if any observations are misclassified, they are given higher weight and this process
continues until all the observations fall into the right class.
 Adaboost can be used for both classification and regression-based problems, however, it is
more commonly used for classification purpose.

An Example of How AdaBoost Works


 AdaBoost is best used to boost the performance of decision trees on binary classification
problems.
 Three ideas behind AdaBoost:-
o AdaBoost combines a lot of “weak learners”(stumps) to make classifications.
18
o Some stumps get more say (information) in the classification than others.
o Each stump is made by taking the previous stump’s mistakes into account

AdaBoost (Adaptive Boosting) Example:

Dataset:

19
20
21
22
23
Bayesian learning algorithm:
Bayesian learning is a powerful method for building models that predict future outcomes by
updating beliefs based on new evidence.

Bayes' Theorem is a fundamental concept in probability theory and statistics, used to update the
probability of a hypothesis based on new evidence. It describes the relationship between conditional
probabilities and is widely applied in machine learning, data analysis, and Bayesian inference.

Formula:

P(A|B) : Probability of event A occurring given that event B has occurred.(posterior probability)

P(B|A) : Probability of event B occurring given that event A has occurred.(likelihood)

P(A) : Prior probability of event A.(prior probability)

P(B) : Prior probability of event B.(evidence)

Advantages of Bayesian Learning

 Allows the use of prior beliefs or domain knowledge


24
 Produces meaningful probabilities that can be interpreted as the degree of belief.

Disadvantages of Bayesian Learning

 The results can be sensitive to the choice of prior, especially when the data is limited
 In some models like Naive Bayes, the assumption of feature independence may not give
good accuracy

Naive Bayes:
 Naive Bayes is a probabilistic algorithm based on Bayes' Theorem. It assumes that all
features are independent of each other when predicting a class.
 Supervised Learning algorithm.
 Primarily used for classification tasks but can also be used for regression tasks.

Bayes' Theorem provides the foundation for Naive Bayes:

Bayes' Theorem and Independence Assumption

Independent features: x1, x2, x3,...,xn and Dependent feature: y

Applying Bayes Theorem:

Independence Assumption:

Naive Bayes Algorithm


1. Calculate Prior Probabilities :

Compute the probability of each class occurring in the dataset.

2. Calculate Likelihoods :

25
For each feature, compute the probability of it occurring given the class.

3. Calculate Posterior Probability :

For each class , calculate the posterior probability given the features .

Formula:

4. Normalization of Values

5. Comparing Probabilities and Class Selection

Choose the class with the highest posterior probability as the prediction.

Naive Bayes Example

Unseen Test Sample :

26
Step 1 : Calculate Prior Probabilities

Step 2 : (a) Calculate Likelihoods For Play = Yes:

(b)Calculate Likelihoods For Play = No :

Step 3 : (a) Calculate Posterior Probabilitys For Play = Yes :

Since P(Features) is same for all classes, we can ignore its computation during
classification.

27
(b)Calculate Posterior Probabilitys For Play = No :

Since P(Features) is same for all classes, we can ignore its computation during
classification.

Step 4 : Normalize Values

Step 5 : Compare probabilities

Conclusion: The model predicts Play = No for the given conditions

Using Naive Bayes, the prediction for : Outlook = Sunny, Temperature = Cool,
Humidity = High, Wind = Strong : NO

Advantages of Naïve Bayes Classifier:


 Simple and easy to implement.
 Efficient with small datasets.

Disadvantages of Naïve Bayes Classifier:


 Assumes independence among features, which may not hold in real-world data.
 Struggles with datasets having highly correlated features.

Applications of Naïve Bayes Classifier:


 Spam filtering.
 Sentiment analysis.
 Medical diagnosis.
 Recommendation systems

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy