0% found this document useful (0 votes)

22 views13 pages

PROBABILISTIC Learning Jb-new

This document discusses probabilistic learning methods in machine learning, highlighting their interpretability compared to traditional neural networks. It covers classification using frequency, unsupervised learning with the Expectation-Maximization (EM) algorithm, and nearest neighbor methods like k-NN. Additionally, it explains Gaussian Mixture Models and the EM algorithm's application in parameter estimation for models with unobserved variables.

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views13 pages

PROBABILISTIC Learning Jb-new

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Probabilistic Learning

Unit-4 MACHINE LEARNING

Dr. John Babu

Probabilistic Learning
Probabilistic learning methods offer a more transparent approach compared to traditional neu-
ral networks like the MLP. Neural networks often provide limited interpretability, where under-
standing neuron activations and weights may not give clear insights. However, with probabilistic
methods, we can directly observe the probabilities involved in decision-making processes.
In this section, we explore classification using probabilities derived from the frequency of
examples in the training data. Probabilistic methods help to handle classification tasks more
explicitly. We also introduce unsupervised learning methods for situations where training labels
are not available. In cases where data is drawn from known probability distributions, we can use
the Expectation-Maximization (EM) algorithm to solve the problem effectively.

Classification using Frequency

In classification problems, we calculate the frequency of each class in the training data. This gives
us an estimate of the probability of that class. We can use these probabilities to assign labels to
new examples.
For example, consider a dataset of fruits with features like size and color. Based on the observed
features, we can estimate the probability of a fruit being an apple or an orange. By choosing the
class with the highest probability, we can classify new data points effectively.

Unsupervised Learning
Unsupervised learning is used when labels are not available for the training examples. Instead of
using labeled data, we focus on identifying patterns or clusters in the data. The EM algorithm is
a widely-used technique in this context.
The EM algorithm works in two main steps:

Expectation (E-step): Estimate the probability that each data point belongs to each cluster.

Maximization (M-step): Recalculate the parameters of the clusters (e.g., means and vari-
ances) to maximize the likelihood of the data.

This iterative process continues until the algorithm converges, resulting in well-defined clusters
in the data.

Nearest Neighbor Methods

Another way to use probabilistic learning is through nearest neighbor methods. Instead of using
a pre-defined model, these methods look for the closest examples from the training set to make

1
predictions. The k-nearest neighbor algorithm (k-NN) is a popular approach, where we find the k
closest examples in the dataset and use their labels to predict the label of a new data point.
For example, if we want to classify a fruit based on its size and color, we would look for the k
most similar fruits in the dataset. By checking their labels (e.g., apple, orange), we can determine
the label for the new fruit.
Probabilistic learning offers a more interpretable way to handle classification problems, com-
pared to neural networks. With methods like classification based on frequency, unsupervised
learning through the EM algorithm, and nearest neighbor approaches, we can build models that
provide clearer insights into how decisions are made.

Gaussian Mixture Models

For the Bayes’ classifier discussed previously, we utilized target labels for supervised learning.
However, when we have data without target labels, we require unsupervised learning methods. In
this section, we explore a special case where different classes originate from their own Gaussian
distributions, known as multi-modal data.
If we know the number of classes, we can estimate the parameters for that many Gaussians
simultaneously. If the number of classes is unknown, we can experiment with various counts to
find the best fit.

Gaussian Mixture Model Equation

The output for any particular data point input into the algorithm is given by:
M
X
f (x) = αm ϕ(x; µm , Σm )
m=1

where ϕ(x; µm , Σm ) is the Gaussian function with mean µm and covariance matrix Σm , and
the αm are weights with the constraint:
M
X
αm = 1.
m=1

This equation describes how the overall probability distribution is a mixture of several Gaussian
distributions.

Estimating Class Probabilities

The probability that an input xi belongs to class m can be estimated as:

α̂m ϕ(xi ; µ̂m , Σ̂m )

p(xi ∈ cm ) = PM .
k=1 α̂k ϕ(xi ; µ̂k , Σ̂k )
The challenge lies in selecting the weights αm . The standard approach is to seek a maximum
likelihood solution, where we maximize the likelihood of the data given the model.

Expectation-Maximization (EM) Algorithm

The EM algorithm is a powerful statistical technique used for finding maximum likelihood es-
timates of parameters in probabilistic models, especially when the data has missing or hidden

Machine Learning - Dr.John Babu Page 2

variables (often called latent variables). The central idea is to introduce these latent variables to
simplify the optimization process.

Gaussian Mixture Model Example

To illustrate the EM algorithm, consider a simple case of a Gaussian mixture model with two
components. Here’s how it works:

Model Definition
Assume there are two Gaussian distributions:

G1 = N (µ1 , σ12 )
G2 = N (µ2 , σ22 )

The overall data distribution is represented as:

y = pG1 + (1 − p)G2

The probability density function can be expressed as:

P (y) = πϕ(y; µ1 , σ1 ) + (1 − π)ϕ(y; µ2 , σ2 )

where π is the mixing coefficient.

Challenge
The goal is to compute the maximum likelihood solution. However, directly differentiating the log-
likelihood function is complex due to the hidden variable f , which indicates from which Gaussian
the data point was generated.

Introducing Latent Variables

Introduce a latent variable f :

f = 0 indicates the data came from Gaussian G1

f = 1 indicates the data came from Gaussian G2

0.0.1 Expectation Step (E-step)

Compute the expected value of the latent variable given the current estimates of the parameters:

γi (µ̂1 , µ̂2 , σ̂1 , σ̂2 , π̂) = E(f |µ̂1 , µ̂2 , σ̂1 , σ̂2 , π̂, D)

Specifically, this expectation computes:

π̂ϕ(yi ; µ̂1 , σ̂1 )

γi =
π̂ϕ(yi ; µ̂1 , σ̂1 ) + (1 − π̂)ϕ(yi ; µ̂2 , σ̂2 )

Machine Learning - Dr.John Babu Page 3

0.0.2 Maximization Step (M-step)
Maximize the expected log-likelihood with respect to the model parameters. Update the param-
eters based on the computed expectations:

M-step 1: Update µ̂1 PN

γ̂i yi
µ̂1 = Pi=1
N
i=1 γ̂i

M-step 2: Update µ̂2

PN
(1 − γ̂i )yi
µ̂2 = Pi=1
N
i=1 (1 − γ̂i )

M-step 3: Update σ̂12

PN
γ̂i (yi − µ̂1 )2
σ̂12 = i=1
PN
i=1 γ̂i

M-step 4: Update σ̂22

PN
− γ̂i )(yi − µ̂2 )2
i=1 (1
σ̂22 = PN
i=1 (1 − γ̂i )

M-step 5: Update π̂ PN
i=1 γ̂i
π̂ =
N

Iteration
Repeat the E-step and M-step until convergence is achieved. The EM algorithm is guaranteed to
converge to a local maximum of the likelihood function.

Conclusion
The EM algorithm effectively handles missing or hidden data by iteratively estimating the expec-
tations of the latent variables and optimizing the model parameters. This framework is widely
used in various applications, particularly in clustering and density estimation tasks.
In many practical learning scenarios, only a subset of relevant instance features is observ-
able. For instance, when training or utilizing a Bayesian belief network, certain variables may
be observed while others remain hidden. To effectively learn in the presence of these unobserved
variables, the EM (Expectation-Maximization) algorithm provides a systematic approach. It can
be applied even when the values of certain variables are never directly observed, given that the
form of the probability distribution governing these variables is known.

Applications of the EM Algorithm

Bayesian Belief Networks: EM is used to train networks where some variables are un-
observed

Radial Basis Function Networks: The algorithm can also be utilized here

Unsupervised Clustering: Many clustering algorithms are based on the EM algorithm

Machine Learning - Dr.John Babu Page 4

Partially Observable Markov Models: The Baum-Welch algorithm, which employs EM,
is widely used in this context

Physical Significance of the EM Steps

The E-step calculates the probability (responsibility) that each data point was generated
by each Gaussian component.

The M-step updates the parameters of the Gaussian components based on these responsi-
bilities.

This process iterates until the parameters converge, refining the model to better explain the
data.

The EM algorithm serves as a powerful tool for parameter estimation in models involving
unobserved variables. By leveraging current hypotheses to estimate hidden data and iteratively
refining those hypotheses, EM approaches a maximum likelihood solution. Its versatility across
various applications makes it a fundamental method in both machine learning and statistical
inference.

Comparison between EM Algorithm and K-Means Algo-

rithm

EM Algorithm K-Means Algorithm

EM is a probabilistic model-based al- K-Means is a centroid-based algorithm
gorithm that estimates parameters. that partitions data into clusters.
EM can accommodate different types of K-Means assumes that clusters are
distributions, including Gaussian. spherical and equally sized.
EM involves an iterative process of ex- K-Means involves iteratively assigning
pectation and maximization steps. points to the nearest centroid.
EM can produce soft assignments, pro- K-Means gives hard assignments,
viding probabilities for cluster member- where each point belongs to one clus-
ship. It is called Soft Clustering ter.It is called Hard Clustering
EM can handle missing data more effec- K-Means does not handle missing data
tively by estimating hidden variables. and requires complete datasets.
EM generally converges more reliably K-Means can converge to different so-
to a local maximum of the likelihood lutions based on initial centroid place-
function. ment.
Table 1: Differences between EM Algorithm and K-Means Algorithm

Nearest Neighbour Methods

Concept
Imagine we want to know the marks percentage of a particular students but we donot have his
marks.Then we can look into the marks of his close friends and come to a conclusion that his

Machine Learning - Dr.John Babu Page 5

marks would also be in the vicinity of the averages of that friends marks. This is similar to how
nearest neighbour methods work: when we don’t have a model for our data, we look at nearby
data points to make a decision.

K-Nearest Neighbors (KNN) Algorithm Example

We will demonstrate the K-Nearest Neighbors (KNN) algorithm using a simple numerical example
with 10 data points. Our goal is to classify a new data point based on its closest neighbors.

Data Points
Consider the following data points in a 2-dimensional space:

Data points: {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10)}
We also have labels for these data points:

Labels: {A, A, A, A, B, B, B, B, B, B}
Our new point is (6.5, 6.5), and we want to classify it using the KNN algorithm with k = 3.

Step-by-Step KNN Algorithm

1. Calculate Distances: First, we compute the distance between the new point (6.5, 6.5) and
each data point using Euclidean distance:
p
Distance = (x1 − x2 )2 + (y1 − y2 )2
For example, the distance between (6.5, 6.5) and (6, 6) is:

p √ √ √
Distance = (6.5 − 6)2 + (6.5 − 6)2 = 0.52 + 0.52 = 0.25 + 0.25 = 0.5 ≈ 0.71

Similarly, we calculate the distances for all other points.

2. Sort by Distance: After calculating the distances, we sort the points by their distances
to the new point:

Sorted Distances: {(6, 6), (7, 7), (5, 5), (8, 8), (4, 4), (9, 9), (3, 3), (10, 10), (2, 2), (1, 1)}

3. Select k Nearest Neighbors: We now select the 3 nearest neighbors (as k = 3):

Nearest Neighbors: {(6, 6), (7, 7), (5, 5)}

4. Determine Majority Class: The labels for these 3 nearest neighbors are:

Labels: {B, B, B}
Since the majority of the neighbors belong to class B, we classify the new point (6.5, 6.5) as
class B.
By using the KNN algorithm with k = 3, the new point (6.5, 6.5) is classified as class B.

Machine Learning - Dr.John Babu Page 6

How It Works
1. Finding Neighbours: - We have data points in a space, and we need to determine which
ones are close to a new point. - To do this, we compute the distance from the new point to each
data point. If there are N points in d dimensions, we need to perform:

O(N · d) operations for each point.

- We ignore the square root in the distance formula because we only want to know which points
are closest.
2. Choosing Neighbours: - After calculating the distances, we identify the k nearest neigh-
bours to the new point. - The class of the new point is assigned based on the most common class
among these neighbours.

Choosing k
- The choice of k is crucial: - If k is too small, the method can be sensitive to noise. - If k is too
large, it may include points that are not relevant, reducing accuracy.

Curse of Dimensionality
As we increase the number of dimensions (d), the distance calculations become more complex.
Although methods like KD-Trees can help with this, the following issues arise: - As dimensions
increase, points tend to spread out, making it harder to find truly ”nearby” points. - Distances
become less meaningful as many points are far apart in some dimensions but close in others.

Bias-Variance Trade-off
For the k-nearest neighbours (KNN) algorithm, we can analyze the bias-variance trade-off:
k
!
1 X
E (ŷ − f (x))2 = σ 2 + (f (x) − f (xi ))2 .

k i=1

The bias-variance tradeoff is crucial in understanding the performance of the KNN algorithm.

Bias
Bias refers to the error introduced by approximating a real-world problem using a simplified
model.

In KNN, when k is small (e.g., k = 1), the model is flexible and fits the training data closely,
resulting in low bias.

However, this flexibility can lead to high variance, as the model captures noise in the training
data.

Variance
Variance refers to the error introduced by the model’s sensitivity to fluctuations in the
training data.

A high variance model, like KNN with small k, performs well on training data but poorly
on unseen data due to overfitting.

Machine Learning - Dr.John Babu Page 7

KNN and the Tradeoff
Small k (e.g., k = 1):

– Low Bias: The model fits training data closely.

– High Variance: The model is sensitive to noise, leading to overfitting.

Large k (e.g., k = 10):

– High Bias: The model generalizes too much, potentially missing important patterns.
– Low Variance: The model is stable, averaging over more neighbors, reducing noise
impact.

Optimal k
Finding a balance between bias and variance is key:

Too small k leads to overfitting (high variance, low bias).

Too large k leads to underfitting (high bias, low variance).

Cross-validation can help determine the optimal k by evaluating model performance on

unseen data.

In KNN, understanding the bias-variance tradeoff is essential for tuning the algorithm effec-
tively, with the goal of minimizing overall error.
The nearest neighbour methods rely on the idea of learning from similar data points. By
carefully choosing k and considering dimensionality, we can effectively classify new data based on
existing patterns.

Efficient Distance Computations: the KD-Tree

Computing distances between all pairs of points can be very expensive. To solve this, we can
use a data structure called the KD-Tree, which reduces the cost of finding a nearest neighbor to
O(log N ) for O(N ) storage. The construction of the tree takes O(N log N ), mainly due to finding
the median.
The KD-tree is built by creating a binary tree that splits one dimension at a time using the
median of the point coordinates. Consider the following seven two-dimensional points:

(5, 4), (1, 6), (6, 1), (7, 5), (2, 7), (2, 2), (5, 8)

Steps for Splitting

1. Initial Split (First Dimension: x): - Sort the points by their x-coordinates:

(1, 6), (2, 2), (2, 7), (5, 4), (5, 8), (6, 1), (7, 5)

- The median point (middle value) is at position 4 (0-indexed), which is (5, 4). - This creates a
split at x = 5.
2. Left Subtree (Points with x < 5): - Remaining points: (1, 6), (2, 2), (2, 7) - Sort by
y-coordinates:
(2, 2), (1, 6), (2, 7)

Machine Learning - Dr.John Babu Page 8

- The median y-coordinate is at position 1, which is (1, 6).
- This creates a split at y = 6.
3. Right Subtree (Points with x ≥ 5): - Remaining points: (5, 4), (5, 8), (6, 1), (7, 5) - Sort
by y-coordinates:
(5, 4), (7, 5), (6, 1), (5, 8)
- The median y-coordinate is at position 1, which is (7, 5).
- This creates a split at y = 5.

Visualization
At this point, the KD-Tree has the following structure:
- Root Node: (5, 4) (5,4)
- Left Child: (1, 6)
- Left: (2, 2)
- Right: (2, 7)
- Right Child: (7, 5) (1,6) (7,5)
- Left: (5, 4)
- Right: (5, 8)
- Child: (6, 1)
(2,2) (2,7) (5,4) (5,8)

Searching the Tree (6,1)

To find the nearest neighbor, we start at the root and compare dimensions one at a time. For
example, if we introduce a test point, say (3, 5):

1. Start at the root (5, 4): x = 5 is greater than x = 3, go left.

2. Move to (1, 6): y = 6 is greater than y = 5, go right to (2, 7).
3. The leaf found is (2, 7). Check distances to see if there is a closer point.

This process continues until all potential points are checked, making the KD-Tree efficient in
finding nearest neighbors with significantly reduced computational costs.

Machine Learning - Dr.John Babu Page 9

Figure 1: Example KD-Tree Structure

Distance Measures
Distance measures are essential in data analysis, particularly in clustering and classification tasks.
They help quantify how similar or different two data points are. Below, we explore several impor-
tant distance metrics along with their real-time applications.

Machine Learning - Dr.John Babu Page 10

Euclidean Distance
The Euclidean distance is the most commonly used distance measure. It calculates the straight-
line distance between two points in Euclidean space. Given two points (x1 , y1 ) and (x2 , y2 ), the
Euclidean distance dE is defined as:
p
dE = (x1 − x2 )2 + (y1 − y2 )2 (1)
This distance is derived from the Pythagorean theorem, where the distance represents the
hypotenuse of a right triangle formed by the differences in the coordinates. In three dimensions,
the formula extends to:
p
dE = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 (2)
In a recommendation system for online shopping, Euclidean distance can be used to measure
the similarity between users based on their purchasing behavior. For instance, if two users have
similar purchase histories, the system can recommend products based on what similar users have
bought.

Manhattan Distance
In contrast to the Euclidean distance, the Manhattan distance (also known as city-block distance)
measures the distance between two points based on a grid-like path. It adds the absolute differences
in each dimension. For two points (x1 , y1 ) and (x2 , y2 ), the Manhattan distance dC is given by:

dC = |x1 − x2 | + |y1 − y2 | (3)

This metric is particularly useful in urban planning, where one must navigate through streets
and blocks rather than a straight line. In higher dimensions, it generalizes to:
n
X
dC = |xi − yi | (4)
i=1

Manhattan distance is often used in robotics, particularly for pathfinding algorithms. For
example, a robot navigating through a city grid will calculate its distance to a destination using
Manhattan distance to find the most efficient path while avoiding obstacles like buildings.

Minkowski Distance
The Minkowski distance is a generalization of both the Euclidean and Manhattan distances. It is
defined for two points x and y in an n-dimensional space, with the parameter k controlling the
distance measure. It is expressed as:

n
!1/k
X
Lk (x, y) = |xi − yi |k (5)
i=1

- For k = 1, this reduces to the Manhattan distance:

n
X
L1 (x, y) = |xi − yi | (6)
i=1

- For k = 2, it becomes the Euclidean distance:

p
L2 (x, y) = (x1 − y1 )2 + (x2 − y2 )2 + . . . + (xn − yn )2 (7)

Machine Learning - Dr.John Babu Page 11

In machine learning, Minkowski distance is commonly used in k-nearest neighbors (KNN)
algorithms, allowing flexibility in choosing the distance metric. For instance, when classifying
images, k can be adjusted to emphasize certain features based on their importance in distinguishing
between classes.

Choosing the Right Distance Metric

The choice of distance metric can significantly impact the results of data analysis. Factors such as
the dimensionality of the data, the presence of outliers, and the underlying data distribution should
influence the selection of the appropriate distance measure. In some cases, more sophisticated
metrics, such as the Mahalanobis distance or invariant metrics like the tangent distance, may be
preferable, particularly for applications such as image recognition.

Figure 2: Distance Metrics

By understanding these distance metrics, we can better analyze and classify data effectively.

Machine Learning - Dr.John Babu Page 12

Lazy vs Eager learning
Lazy Learning
Lazy learning refers to a type of machine learning where the model does not explicitly generalize
from the training data during the training phase. Instead, it stores the training instances and
delays the learning process until a query is made.
K-Nearest Neighbors (KNN) is considered lazy learning because it does not build a model
ahead of time. Instead, it relies on the entire dataset at the time of prediction, calculating the
distance to find the nearest neighbors to make decisions, which can lead to high memory usage
and slow predictions.

Eager Learning
Eager learning is a machine learning approach where the model is built during the training phase,
resulting in a general representation of the data. This model is then used for making predictions,
leading to faster response times during inference since no additional computation is needed. Eager
learning is termed so because the model eagerly captures patterns and relationships from the
training data upfront.
Examples of eager learning algorithms include:
Decision Trees: Create a tree-like model based on feature splits.
Neural Networks: Learn representations through layered architectures.
Support Vector Machines: Find the optimal hyperplane for classification.

Lazy Learning Eager Learning

Learns from training data at the time Learns from training data during the
of query. training phase.
Examples: K-Nearest Neighbors Examples: Neural Networks, Support
(KNN), Decision Trees. Vector Machines (SVM).
No explicit model is built; data is An explicit model is built during train-
stored. ing.
Fast training time, slow prediction Slow training time, fast prediction
time. time.
High memory usage due to storing all Lower memory usage, as only the
instances. model parameters are stored.
Adapts quickly to new data; instant up- Requires retraining to adapt to new
dates. data.
Table 2: Differences between Lazy Learning and Eager Learning

Machine Learning - Dr.John Babu Page 13

TTK-Annual-Report-2023-24-_Final_06.08.2024-1
No ratings yet
TTK-Annual-Report-2023-24-_Final_06.08.2024-1
216 pages
Model42C Chemilun 156file - 17809
0% (1)
Model42C Chemilun 156file - 17809
234 pages
Vogue India May 2023
No ratings yet
Vogue India May 2023
144 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Republic of The Philippines Position Description Form DBM-CSC Form No. 1
No ratings yet
Republic of The Philippines Position Description Form DBM-CSC Form No. 1
4 pages
Sutadian Et Al., 2016
No ratings yet
Sutadian Et Al., 2016
29 pages
Confined Space Questions and Answers
100% (5)
Confined Space Questions and Answers
8 pages
Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
Fire Rated Cable
No ratings yet
Fire Rated Cable
1 page
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
EM and Kmeans relations
No ratings yet
EM and Kmeans relations
70 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
MMEL BE-300 Rev 10a
No ratings yet
MMEL BE-300 Rev 10a
91 pages
Expectation Maximization (EM) Algorithm.pptx
No ratings yet
Expectation Maximization (EM) Algorithm.pptx
47 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
cs229 MT Review
No ratings yet
cs229 MT Review
54 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Em and Forward
No ratings yet
Em and Forward
32 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
5 Clustering
No ratings yet
5 Clustering
38 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Catalogue: Transmission Network Access Solutions
No ratings yet
Catalogue: Transmission Network Access Solutions
24 pages
31Q S4CLD2402 BPD en SG
No ratings yet
31Q S4CLD2402 BPD en SG
33 pages
4- Planning, Wrting and Completing Reports and Proposals 2
No ratings yet
4- Planning, Wrting and Completing Reports and Proposals 2
31 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Anglican Young People Association - St. Stephen Branch, Tema
No ratings yet
Anglican Young People Association - St. Stephen Branch, Tema
28 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
Ste - Unit3 - Presentation Updated
No ratings yet
Ste - Unit3 - Presentation Updated
31 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
No ratings yet
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
47 pages
EM-converted
No ratings yet
EM-converted
22 pages
Caffeine-Related Disorders-3
No ratings yet
Caffeine-Related Disorders-3
21 pages
How To Assess Working Capital Requirement
No ratings yet
How To Assess Working Capital Requirement
34 pages
ABAP Development Tools - News and Roadmap
No ratings yet
ABAP Development Tools - News and Roadmap
20 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
Air Act 1981 Relevant Provisions
No ratings yet
Air Act 1981 Relevant Provisions
26 pages
ds11 2
No ratings yet
ds11 2
19 pages
UNIT-3_SEMANTICS MATERIAL
No ratings yet
UNIT-3_SEMANTICS MATERIAL
16 pages
lecture5
No ratings yet
lecture5
16 pages
Operating Systems Glossary
No ratings yet
Operating Systems Glossary
4 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
ASSIGNMENT1
No ratings yet
ASSIGNMENT1
7 pages
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
No ratings yet
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
22 pages
ML UNIT III
No ratings yet
ML UNIT III
12 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
AI UNIT 3 tycs
No ratings yet
AI UNIT 3 tycs
16 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
Syntax_complete
No ratings yet
Syntax_complete
22 pages
Skapa catalogue 2024 dec
No ratings yet
Skapa catalogue 2024 dec
9 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
ExpectationMaximization Algorithm (1)
No ratings yet
ExpectationMaximization Algorithm (1)
7 pages
Pegc One Year Final 1
No ratings yet
Pegc One Year Final 1
8 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
Focke Wulf Ta 152H Oz13843
No ratings yet
Focke Wulf Ta 152H Oz13843
2 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
gmm
No ratings yet
gmm
8 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
3.learing With Hidden Data
No ratings yet
3.learing With Hidden Data
11 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Unit 2
No ratings yet
Unit 2
7 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
UNIT 8 STASTICAL LEARNING METHOD
No ratings yet
UNIT 8 STASTICAL LEARNING METHOD
4 pages
AI29
No ratings yet
AI29
3 pages
machine 2023 part 1
No ratings yet
machine 2023 part 1
4 pages
15_GMC
No ratings yet
15_GMC
4 pages
LCC Naga - Plumbing Design Analysis
100% (4)
LCC Naga - Plumbing Design Analysis
4 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
A098441461
No ratings yet
A098441461
2 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Mixture Models and EM Algorithm: S. Sumitra
No ratings yet
Mixture Models and EM Algorithm: S. Sumitra
4 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Specification For Laundry Detergent Powder For Household Use in Manual Washing
100% (1)
Specification For Laundry Detergent Powder For Household Use in Manual Washing
8 pages
Cmo 12, S. 2008 - Approved Ps For Bsge
No ratings yet
Cmo 12, S. 2008 - Approved Ps For Bsge
15 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
NH 2 Buniadpur R Dutta +3 14 Nov 2023
No ratings yet
NH 2 Buniadpur R Dutta +3 14 Nov 2023
1 page
Resume Chen
No ratings yet
Resume Chen
1 page
ESAT Summary - Highly Proficient Teachers
100% (1)
ESAT Summary - Highly Proficient Teachers
5 pages
Safety Data Sheet: LAGD 125F..
No ratings yet
Safety Data Sheet: LAGD 125F..
4 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PROBABILISTIC Learning Jb-new

Uploaded by

PROBABILISTIC Learning Jb-new

Uploaded by

Probabilistic Learning

Unit-4 MACHINE LEARNING

Classification using Frequency

Nearest Neighbor Methods

Gaussian Mixture Models

Gaussian Mixture Model Equation

Estimating Class Probabilities

α̂m ϕ(xi ; µ̂m , Σ̂m )

Expectation-Maximization (EM) Algorithm

Machine Learning - Dr.John Babu Page 2

Gaussian Mixture Model Example

The overall data distribution is represented as:

The probability density function can be expressed as:

P (y) = πϕ(y; µ1 , σ1 ) + (1 − π)ϕ(y; µ2 , σ2 )

where π is the mixing coefficient.

Introducing Latent Variables

 f = 0 indicates the data came from Gaussian G1

 f = 1 indicates the data came from Gaussian G2

0.0.1 Expectation Step (E-step)

Specifically, this expectation computes:

π̂ϕ(yi ; µ̂1 , σ̂1 )

Machine Learning - Dr.John Babu Page 3

 M-step 1: Update µ̂1 PN

 M-step 2: Update µ̂2

 M-step 3: Update σ̂12

 M-step 4: Update σ̂22

Applications of the EM Algorithm

 Unsupervised Clustering: Many clustering algorithms are based on the EM algorithm

Machine Learning - Dr.John Babu Page 4

Physical Significance of the EM Steps

Comparison between EM Algorithm and K-Means Algo-

EM Algorithm K-Means Algorithm

Nearest Neighbour Methods

Machine Learning - Dr.John Babu Page 5

K-Nearest Neighbors (KNN) Algorithm Example

Step-by-Step KNN Algorithm

Similarly, we calculate the distances for all other points.

Nearest Neighbors: {(6, 6), (7, 7), (5, 5)}

Machine Learning - Dr.John Babu Page 6

O(N · d) operations for each point.

Machine Learning - Dr.John Babu Page 7

– Low Bias: The model fits training data closely.

 Large k (e.g., k = 10):

 Too small k leads to overfitting (high variance, low bias).

 Too large k leads to underfitting (high bias, low variance).

 Cross-validation can help determine the optimal k by evaluating model performance on

Efficient Distance Computations: the KD-Tree

Steps for Splitting

Machine Learning - Dr.John Babu Page 8

Searching the Tree (6,1)

1. Start at the root (5, 4): x = 5 is greater than x = 3, go left.

Machine Learning - Dr.John Babu Page 9

Machine Learning - Dr.John Babu Page 10

dC = |x1 − x2 | + |y1 − y2 | (3)

- For k = 1, this reduces to the Manhattan distance:

- For k = 2, it becomes the Euclidean distance:

Machine Learning - Dr.John Babu Page 11

Choosing the Right Distance Metric

Figure 2: Distance Metrics

Machine Learning - Dr.John Babu Page 12

Lazy Learning Eager Learning

Machine Learning - Dr.John Babu Page 13

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

f = 0 indicates the data came from Gaussian G1

f = 1 indicates the data came from Gaussian G2

M-step 1: Update µ̂1 PN

M-step 2: Update µ̂2

M-step 3: Update σ̂12

M-step 4: Update σ̂22

Unsupervised Clustering: Many clustering algorithms are based on the EM algorithm

Large k (e.g., k = 10):

Too small k leads to overfitting (high variance, low bias).

Too large k leads to underfitting (high bias, low variance).

Cross-validation can help determine the optimal k by evaluating model performance on