0% found this document useful (0 votes)

2 views56 pages

MLLecture-1

This document discusses unsupervised learning techniques, focusing on clustering methods for image segmentation, such as semantic and instance segmentation. It elaborates on the DBSCAN algorithm for density-based clustering and Gaussian Mixture Models (GMM) for probabilistic clustering, including the Expectation-Maximization algorithm for fitting GMMs. Additionally, it addresses the challenges of high-dimensional data and dimensionality reduction techniques like PCA, highlighting their advantages and disadvantages.

Uploaded by

mamidiganapathi438

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views56 pages

MLLecture-1

Uploaded by

mamidiganapathi438

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

UNSUPERVISED LEARNING

TECHNIQUES
UNIT 4
CLUSTERING FOR IMAGE SEGMENTATION
• Image segmentation is the task of partitioning an image into multiple
segments.
• In semantic segmentation, all pixels that are part of the same object
type get assigned to the same segment.
• For example, in a self-driving car’s vision system, all pixels that are
part of a pedestrian’s image might be assigned to the “pedestrian”
segment (there would just be one segment containing all the
pedestrians).
• In instance segmentation, all pixels that are part of the same
individual object are assigned to the same segment.
• In this case there would be a different segment for each pedestrian.
• In some applications, this may be sufficient, for example if you want
to analyze satellite images to measure how much total forest area
there is in a region, color segmentation may be just fine.
Clustering for Preprocessing
• Clustering can be an efficient approach to dimensionality reduction, in particular
as a preprocessing step before a supervised learning algorithm.
Using Clustering for Semi-Supervised Learning
• Another use case for clustering is in semi-supervised learning, when we
have plenty of unlabeled instances and very few labeled instances.
DBSCAN (Density Based Spatial Clustering
Application with Noise)
• This algorithm defines clusters as continuous regions of high density.
• Groups together closely packed data points. Marks the outlier points
as low-density regions.
• The algorithm can figure out any arbitrary shaped clusters.
• The algorithm works on two parameters :
• Epsilon (ε): The maximum distance between two samples for one
data point to be considered in the neighborhood of the other data
point.
• Minimum points (minPts): The minimum number of points required
to form a dense region.
Algorithm
• For each instance, the algorithm counts how many instances are
located within a small distance ε (epsilon) from it. This region is called
the instance’s ε neighborhood.
• If an instance has at least min_samples instances in its ε-
neighborhood (including itself), then it is considered a core instance.
In other words, core instances are those that are located in dense
regions.
• All instances in the neighborhood of a core instance belong to the
same cluster. This may include other core instances, therefore a long
sequence of neighboring core instances forms a single cluster.
• Any instance that is not a core instance and does not have one in its
neighbor hood is considered an anomaly.
Example:
• Consider the dataset :
Point F1 F2
P1 4.5 8
P2 5 7
P3 6 6.5
P4 7 5
P5 9 4
P6 7 3
P7 8 3.5
P8 9 5
P9 4 4
P10 3 7.5
P11 4 6
P12 3.5 5
• ε =1.9
• min_pts=4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
P1 0 1.12 2.12 3.91 6.02 5.59 5.70 5.41 4.03 1.58 2.06 3.16
P2 1.12 0 1.12 2.83 5.0 4.47 4.61 4.47 3.16 2.06 1.41 2.5
P3 2.12 1.12 0 1.80 3.91 3.64 3.61 3.35 3.20 3.16 2.06 2.92
P4 3.91 2.83 1.80 0 2.24 2.0 1.80 2.0 3.16 4.72 3.16 3.50
P5 6.02 5.0 3.91 2.24 0 2.24 1.12 1.0 5.0 6.95 5.39 5.59
P6 5.59 4.47 3.64 2.0 2.24 0 1.12 2.83 3.16 6.02 4.24 4.03
P7 5.70 4.61 3.61 1.80 1.12 1.12 0 1.80 4.03 6.40 4.72 4.74
P8 5.41 4.47 3.35 2.0 1.0 2.83 1.80 0 5.10 6.50 5.10 5.50
P9 4.03 3.16 3.20 3.16 5.0 3.16 4.03 5.10 0 3.64 2.00 1.12
P10 1.58 2.06 3.16 4.72 6.95 6.02 6.40 6.50 3.64 0 1.80 2.55
P11 2.06 1.41 2.06 3.16 5.39 4.24 4.72 5.10 2.00 1.80 0 1.12
P12 3.16 2.5 2.92 3.50 5.59 4.03 4.74 5.50 1.12 2.55 1.12 0
Point Identification
P1 NOISE BORDER
P2 CLUSTER CLUSTER
P3 NOISE BORDER
P4 NOISE BORDER
P5 NOISE BORDER
P6 NOISE BORDER
P7 CLUSTER CLUSTER
P8 NOISE BORDER
P9 NOISE CLUSTER
P10 NOISE BORDER
P11 CLUSTER CLUSTER
P12 NOISE BORDER
• Advantages:
• Is great of separating clusters of high density versus clusters of low
density within a given dataset.
• Is great with handling outliers within the dataset.
• Disadvantages:
• Does not work well when dealing with clusters of varying densities.
• DBSCAN struggles with clusters of similar density.
• High dimensionality data.
Gaussian Mixtures
• A Gaussian mixture model (GMM) is a probabilistic model that
assumes that the instances were generated from a mixture of several
Gaussian distributions whose parameters are unknown.
• All the instances generated from a single Gaussian distribution form a
cluster that typically looks like an ellipsoid.
• Each cluster can have a different ellipsoidal shape, size, density and
orientation.
• K-means is a clustering algorithm that assigns each data point to one
cluster based on the closest centroid. It’s a hard clustering method,
meaning each point belongs to only one cluster with no uncertainty.
• On the other hand, Gaussian Mixture Models (GMM) use soft
clustering, where data points can belong to multiple clusters with a
certain probability.
• The below image has a few Gaussian distributions with a difference in
mean (μ) and variance (σ2). Remember that the higher the σ value
more would be the spread:
1.Multiple Gaussians (Clusters): Each cluster is represented by a
Gaussian distribution, and the data points are assigned probabilities
of belonging to different clusters based on their distance from each
Gaussian.

2.Parameters of a Gaussian: The core of GMM is made up of three

main parameters for each Gaussian:
2. Mean (μ): The center of the Gaussian distribution.

3. Covariance (Σ): Describes the spread or shape of the cluster.

4. Mixing Probability (π): Determines how dominant or likely each cluster is in

the data.
• The Gaussian mixture model assigns a probability to each data point
xnx_nxn of belonging to a cluster. The probability of data point
coming from Gaussian cluster k is expressed as

• Next, we need to calculate the overall likelihood of observing a data

point xnx_nxn under all Gaussians. This is achieved by summing over
all possible clusters (Gaussians) for each point:
The Expectation-Maximization (EM) Algorithm
To fit a Gaussian Mixture Model to the data, we use the Expectation-
Maximization (EM) algorithm, which is an iterative method that optimizes
the parameters of the Gaussian distributions (mean, covariance, and
mixing coefficients). It works in two main steps:
1.Expectation Step (E-step):In this step, the algorithm calculates the
probability that each data point belongs to each cluster based on the
current parameter estimates (mean, covariance, mixing coefficients).

2.Maximization Step (M-step):After estimating the probabilities, the

algorithm updates the parameters (mean, covariance, and mixing
coefficients) to better fit the data.

• These two steps are repeated until the model converges, meaning the
parameters no longer change significantly between iterations.
GMM Algorithm
1.Initialization: Start with initial guesses for the means, covariances,
and mixing coefficients of each Gaussian distribution.

2.E-step: For each data point, calculate the probability of it belonging

to each Gaussian distribution (cluster).

3.M-step: Update the parameters (means, covariances, mixing

coefficients) using the probabilities calculated in the E-step.

4.Repeat: Continue alternating between the E-step and M-step until

the log-likelihood of the data (a measure of how well the model fits
the data) converges.
• The E-step computes the probabilities that each data point
belongs to each Gaussian, while the M-step updates the
parameters μk, Σk , and πk based on these probabilities.
Dimensionality Reduction
• The Curse of Dimensionality:
• Curse of Dimensionality refers to a set of problems that arise when
working with high-dimensional data.
• The dimension of a dataset corresponds to the number of
attributes/features that exist in a dataset.
• A dataset with a large number of attributes, generally of the order of
a hundred or more, is referred to as high dimensional data.
• Some of the difficulties that come with high dimensional data
manifest during analyzing or visualizing the data to identify patterns,
and some manifest while training machine learning models.
• The difficulties related to training machine learning models due to
high dimensional data is referred to as ‘Curse of Dimensionality’.
Solutions to Curse of Dimensionality:
• One of the ways to reduce the impact of high dimensions is to use a
different measure of distance in a space vector.
• One could explore the use of cosine similarity to replace Euclidean
distance.
• Cosine similarity can have a lesser impact on data with higher dimensions.
However, use of such method could also be specific to the required
solution of the problem.
• Other methods:
• Other methods could involve the use of reduction in dimensions. Some of
the techniques that can be used are:
1. Forward-feature selection: This method involves picking the most useful
subset of features from all given features.
2. PCA/t-SNE: Though these methods help in reduction of number of
features, but it does not necessarily preserve the class labels and thus can
make the interpretation of results a tough task.
Main Approaches for Dimensionality Reduction
• The two main approaches to reducing dimensionality:
• Projection: lower-dimensional subspace of the high-dimensional space.
However, projection is not always the best approach to dimensionality
reduction.
• Manifold Learning:
• Manifold learning is a type of non-linear dimensionality reduction
process.
• It relies on the manifold assumption, also called the manifold hypothesis,
which holds that most real-world high-dimensional datasets lie close to a
much lower-dimensional manifold.
Principal Components Analysis
• This method was introduced by Karl Pearson. It works on a condition
that while the data in a higher dimensional space is mapped to data
in a lower dimension space, the variance of the data in the lower
dimensional space should be maximum.
• It involves the following steps:
• Construct the covariance matrix of the data.
• Compute the eigenvectors of this matrix.
• Eigenvectors corresponding to the largest eigenvalues are used to
reconstruct a large fraction of variance of the original data.
• Hence, we are left with a lesser number of eigenvectors, and there
might have been some data loss in the process. But, the most
important variances should be retained by the remaining
eigenvectors.
• Preserving the Variance:
• you can project the training set onto a lower-dimensional hyperplane,
you first need to choose the right hyperplane.
• It seems reasonable to select the axis that preserves the maximum
amount of variance, as it will most likely lose less information than
the other projections. Another way to justify this choice is that it is
the axis that minimizes the mean squared distance between the
original dataset and its projection onto that axis.
Principle Components
• The unit vector that defines the ith axis is called the ith principal
component (PC), the 1st PC is c1 and the 2nd PC is c2.
• Luckily, there is a standard matrix factorization technique called
Singular Value Decomposition (SVD) that can decompose the training
set matrix X into the matrix multiplication of three matrices U Σ VT,
where V contains all the principal components.
• Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is
sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to
define datasets.
• We may not know how many principal components to keep- in
practice, some thumb rules are applied.
Steps in PCA:
• Step 1:

• Step2
• Step 3:
• PCA will provide a mechanism to recognize this geometric similarity
through algebraic means.
• The covariance matrix S is a symmetric matrix and According to Spectral
Theorem(Spectral Decomposition)

• Here we call ⃗vi as Eigen Vector and λi as the corresponding Eigen Value
and A as the covariance matrix.
• Step 4: Inferring the Principal components from Eigen Values of the
Co Variance Matrix From Spectral theorem we infer:

oThe Most Significant Principal Component is the Eigen vector

corresponding to the largest Eigen Value.
• Step 5: Projecting the data using the Principal Components
o The projection matrix is obtained by selected Eigen vectors(k<d)
numbers. The original dataset is transformed via the projection matrix
to obtain a reduced k dimension subspace of original dataset.
Step 1: Calculate Mean
• The figure shows the scatter plot of the given data points.
• Calculate the mean of X1 and X2 as shown below.
Step 2: Calculation of the covariance matrix.
The covariances are calculated as follows:
• The covariance matrix is,
Step 3: Eigenvalues of the covariance matrix
• The characteristic equation of the covariance matrix is,

• Solving the characteristic equation we get,

Step 4: Computation of the eigenvectors
To find the first principal components, we need only compute the
eigenvector corresponding to the largest eigenvalue. In the present
example, the largest eigenvalue is λ and so we compute the
1

eigenvector corresponding to λ .
1

The eigenvector corresponding to λ = λ is a vector

1
• satisfying the following equation:

• Using the theory of systems of linear equations, we note that these

equations are not independent and solutions are given by,
• Step 5: Computation of first principal components
• let, be the kth sample in the above Table (dataset). The first
principal component of this example is given by (here “T” denotes the
transpose of the matrix)

For example, the first principal component corresponding to the first

example
is calculated as follows:
• Step 6: Geometrical meaning of first principal components
• First, we shift the origin to the “center” and then change the
directions of coordinate axes to the directions of the eigenvectors
e1 and e2.
• Next, we drop perpendiculars from the given data points to the e1-
axis (see below Figure).
• The first principal components are the e1-coordinates of the feet of
perpendiculars, that is, the projections on the e1-axis. The projections
of the data points on the e1-axis may be taken as approximations of
the given data points hence we may replace the given data set with
these points.
• Now, each of these approximations can be unambiguously specified
by a single number, namely, the e1-coordinate of approximation. Thus
the two-dimensional data set can be represented approximately by
the following one-dimensional data set.
USING SCIKIT-LEARN
• Scikit-Learn’s PCA class implements PCA using SVD decomposition
just like we did before. The following code applies PCA to reduce the
dimensionality of the dataset down to two dimensions (note that it
automatically takes care of centering the data):
RANDOMIZED PCA
• If you set the svd_solver hyperparameter to "randomized", Scikit
Learn uses a stochastic algorithm called Randomized PCA that quickly
finds an approximation of the first d principal components. Its
computational complexity is O(m × d 2) + O(d 3), instead of O(m ×
n^2) + O(n3) for the full SVD approach, so it is dramatically faster than
full SVD when d is much smaller than n:
• By default, svd_solver is actually set to "auto": Scikit-Learn
automatically uses the randomized PCA algorithm if m or n is greater
than 500 and d is less than 80% of m or n, or else it uses the full SVD
approach. If you want to force Scikit-Learn to use full SVD, you can set
the svd_solver hyperparameter to "full“.

• Linear dimensionality reduction using Singular Value Decomposition

of the data to project it to a lower dimensional space. The input data
is centered but not scaled for each feature before applying the SVD.
KERNEL PCA
• Kernel PCA a mathematical technique that implicitly maps instances
into a very high-dimensional space (called the feature space),
enabling nonlinear classification and regression with Support Vector
Machines.
• A linear decision boundary in the high-dimensional feature space
corresponds to a complex nonlinear decision boundary in the original
space. It turns out that the same trick can be applied to PCA, making
it possible to perform complex nonlinear projections for
dimensionality reduction. This is called Kernel PCA (kPCA).
• It is often good at preserving clusters of instances after projection, or
sometimes even unrolling datasets that lie close to a twisted
manifold.

ASAE s278.6
No ratings yet
ASAE s278.6
4 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Image Segmentation1
No ratings yet
Image Segmentation1
42 pages
Expectation-Maximization Clustring V2
No ratings yet
Expectation-Maximization Clustring V2
9 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
Unit 2
No ratings yet
Unit 2
89 pages
Unit 5
No ratings yet
Unit 5
5 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
M5
No ratings yet
M5
40 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
65 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Learning_ a Comprehensive Overview Of
No ratings yet
Unsupervised Learning_ a Comprehensive Overview Of
5 pages
Clustering
No ratings yet
Clustering
82 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
Clustering
No ratings yet
Clustering
27 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
ML_lecture14
No ratings yet
ML_lecture14
17 pages
EM and Kmeans relations
No ratings yet
EM and Kmeans relations
70 pages
4 Clustering
No ratings yet
4 Clustering
21 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Unit-V Clustering part 1
No ratings yet
Unit-V Clustering part 1
26 pages
Unit 5
No ratings yet
Unit 5
10 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Lecture Unsupervised (17!04!2024).Pptx
No ratings yet
Lecture Unsupervised (17!04!2024).Pptx
61 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
M5
No ratings yet
M5
40 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Clustering Illustrations Publishing 1
No ratings yet
Clustering Illustrations Publishing 1
54 pages
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
No ratings yet
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
21 pages
ML - 8
No ratings yet
ML - 8
70 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
PMP Formula Guide
From Everand
PMP Formula Guide
Mohammad Usmani
4.5/5 (16)
Float Switch
No ratings yet
Float Switch
6 pages
EOY- syllabus- MYP 3
No ratings yet
EOY- syllabus- MYP 3
17 pages
Testing Endogeneity
No ratings yet
Testing Endogeneity
3 pages
Lecture 1.2 (Wind Loading) Steel
No ratings yet
Lecture 1.2 (Wind Loading) Steel
18 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Ammonia Technology
100% (4)
Ammonia Technology
16 pages
1dec Chemical Sciences Piii
No ratings yet
1dec Chemical Sciences Piii
32 pages
Performance Evaluation of A Double Drum Dryer For Potato Flake Production
No ratings yet
Performance Evaluation of A Double Drum Dryer For Potato Flake Production
8 pages
(Hasan Et Al, 2024)
No ratings yet
(Hasan Et Al, 2024)
7 pages
Faraday's Laws of Electromagnetic Induction
No ratings yet
Faraday's Laws of Electromagnetic Induction
8 pages
The Gas Laws:: Pressure Volume Temperature Relationships Boyle's Law: The Pressure-Volume Law
No ratings yet
The Gas Laws:: Pressure Volume Temperature Relationships Boyle's Law: The Pressure-Volume Law
4 pages
UML Deployment WS P
No ratings yet
UML Deployment WS P
26 pages
Introduction To Transaction Processing
No ratings yet
Introduction To Transaction Processing
20 pages
Laws of Motion
No ratings yet
Laws of Motion
10 pages
Lab Assignment 4
No ratings yet
Lab Assignment 4
2 pages
Donnelly Chute
No ratings yet
Donnelly Chute
14 pages
Lab 2 Its
No ratings yet
Lab 2 Its
13 pages
IT NE 2005 LAB 4 - Securing Administrative Access Using AAA and RADIUS.
No ratings yet
IT NE 2005 LAB 4 - Securing Administrative Access Using AAA and RADIUS.
16 pages
Geometric Stiffness and P-Delta Effects
No ratings yet
Geometric Stiffness and P-Delta Effects
14 pages
Miller Submerged Arc Handbook
50% (2)
Miller Submerged Arc Handbook
32 pages
Resistivity Conductivity and Temperature Coefficients
No ratings yet
Resistivity Conductivity and Temperature Coefficients
6 pages
Methodology and Seesaw
No ratings yet
Methodology and Seesaw
8 pages
Tank - Pipe Sizing
No ratings yet
Tank - Pipe Sizing
27 pages
Answer: 1. The SWOT Matrix, SPACE Matrix, BCG Matrix, IE Matrix, and The Grand Strategy
No ratings yet
Answer: 1. The SWOT Matrix, SPACE Matrix, BCG Matrix, IE Matrix, and The Grand Strategy
2 pages
Fischer - Piano Tuning
No ratings yet
Fischer - Piano Tuning
65 pages
St. Paul University Philippines
No ratings yet
St. Paul University Philippines
46 pages
techspec
No ratings yet
techspec
28 pages
MATHEMATICS WORKSHEET-2 (SESSION 2024-2025)
No ratings yet
MATHEMATICS WORKSHEET-2 (SESSION 2024-2025)
3 pages
Chapter 01
No ratings yet
Chapter 01
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MLLecture-1

Uploaded by

MLLecture-1

Uploaded by

UNSUPERVISED LEARNING

2.Parameters of a Gaussian: The core of GMM is made up of three

3. Covariance (Σ): Describes the spread or shape of the cluster.

4. Mixing Probability (π): Determines how dominant or likely each cluster is in

• Next, we need to calculate the overall likelihood of observing a data

2.Maximization Step (M-step):After estimating the probabilities, the

2.E-step: For each data point, calculate the probability of it belonging

3.M-step: Update the parameters (means, covariances, mixing

4.Repeat: Continue alternating between the E-step and M-step until

oThe Most Significant Principal Component is the Eigen vector

• Solving the characteristic equation we get,

The eigenvector corresponding to λ = λ is a vector

• Using the theory of systems of linear equations, we note that these

For example, the first principal component corresponding to the first

• Linear dimensionality reduction using Singular Value Decomposition

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.