0% found this document useful (0 votes)

13 views10 pages

K-MEANS-FINAL

The document discusses unsupervised learning, particularly focusing on the k-means algorithm, which clusters data points into distinct groups based on their features. It contrasts supervised and unsupervised learning, detailing the steps and challenges of the k-means algorithm, including issues like local minima and the choice of the number of clusters. Additionally, it explores the connection between k-means and neural networks, emphasizing normalization and weight updates for effective clustering.

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

K-MEANS-FINAL

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Unit-4 UNSUPERVISED LEARNING

Dr. John Babu

October 7, 2024

1 Supervised Learning vs Unsupervised Learning

1.1 Supervised Learning
Supervised learning is a type of machine learning where the algorithm is trained on labeled data, meaning the
input data is paired with the correct output. The goal is to learn a mapping from inputs to outputs, so that
when given new input data, the algorithm can predict the corresponding output.

Example: Consider a dataset of house prices. Each house is represented by features such as the number of
bedrooms, square footage, and location. The price of the house is the output (or label). In supervised learning,
the algorithm is trained on this data to predict the price of a new house based on its features.

Input (Features): Number of bedrooms, square footage, location

Output (Label): House price

Common algorithms in supervised learning include:

Linear regression

Decision trees

Support vector machines (SVM)

Neural networks

1.2 Unsupervised Learning

Unsupervised learning is a type of machine learning where the algorithm is given data that is not labeled. The
algorithm must learn patterns, relationships, or structures in the data without explicit guidance on what the
output should be. It is mainly used for clustering or dimensionality reduction.

Example: Consider a dataset of customer data at a retail store with features like age, purchasing history, and
browsing time. There are no labels to indicate what group each customer belongs to. In unsupervised learning,
the algorithm can group similar customers together, which can be useful for targeting marketing campaigns.

Input (Features): Age, purchasing history, browsing time

Output: No predefined label; the algorithm finds patterns like customer segments.

Common algorithms in unsupervised learning include:

K-means clustering

Hierarchical clustering

Principal component analysis (PCA)

Autoencoders

1
1.3 Differences Between Supervised and Unsupervised Learning

Supervised Learning Unsupervised Learning

Requires labeled data (input-output pairs). Works with unlabeled data (only input features).
The goal is to learn a mapping from input to output. The goal is to discover hidden patterns in the data.
Used for tasks like classification and regression. Used for tasks like clustering and dimensionality reduction.
Example: Predicting house prices. Example: Grouping customers based on purchasing behavior.
Algorithms: Linear regression, SVMs, etc. Algorithms: K-means, PCA, etc.
Feedback provided in the form of labels. No feedback provided during training.
Can be used for prediction. Mainly used for exploratory data analysis.

Table 1: Comparison between Supervised and Unsupervised Learning

2 K-Means Algorithm
The K-Means algorithm is used to divide a given dataset into k distinct groups or clusters. Below is an
easy-to-follow explanation of the algorithm step by step:

Objective The goal of the K-Means algorithm is to group the data points into k clusters such that points in
the same cluster are more similar to each other than to points in other clusters.

Initial Setup You start by choosing the number of clusters (k ), which you need to decide beforehand. Initially,
the algorithm assigns k random points in the input space as the centers of the clusters (called centroids).

Distance Calculation To group data points, we need to measure how close a point is to a cluster center.
This is done using a distance measure like Euclidean distance (the straight-line distance between two points).

Assigning Points to Clusters For each data point, calculate its distance from all the cluster centers. Assign
each data point to the cluster whose center is closest to it. This means each point is grouped based on the
minimum distance to a cluster center.

Updating Cluster Centers After assigning all data points to clusters, compute the mean of the points
in each cluster. The new cluster center is placed at this mean point (the average position of all points in the
cluster). This step ensures that the cluster centers move towards the center of the group of points they are
responsible for.

Iterative Process Repeat the steps of reassigning points to clusters and updating the cluster centers. This
process continues until the cluster centers stop moving or change very little, meaning the algorithm has converged
and the clusters are stable.

Stopping Criteria The algorithm stops when the cluster centers don’t change their positions significantly,
meaning the best cluster centers for the data have been found.

Example Imagine a group of people (data points) being grouped around k tour guides (cluster centers).
Initially, the tour guides stand randomly, and people move toward the guide closest to them. As people gather,
the tour guides move to the center of their respective groups. This process repeats until the tour guides no
longer need to move.

2.1 K-Means Algorithm

1. Start by selecting k (number of clusters).
2. Randomly initialize k cluster centers.
3. For each point, calculate the distance to each cluster center and assign it to the closest one.
4. Recalculate the cluster center as the mean of the assigned points.
5. Repeat the process until cluster centers stop moving.
6. The output is k clusters where each point belongs to its nearest cluster.

2
The k-Means Algorithm
Initialisation
– Choose a value for k.
– Choose k random positions in the input space.
– Assign the cluster centres µj to those positions.
Learning
– Repeat
* For each datapoint xi :
Compute the distance to each cluster centre.
Assign the datapoint to the nearest cluster centre with distance di = minj d(xi , µj ).
* For each cluster centre:
Move the position of the centre to the mean of the points in that cluster:
Nj
1 X
µj = xi
Nj i=1
where Nj is the number of points in cluster j.
– Until the cluster centres stop moving.
Usage
– For each test point:
* Compute the distance to each cluster centre.
* Assign the datapoint to the nearest cluster centre with distance di = minj d(xi , µj ).

3 Issues in the k-Means Algorithm

1. Clustering Data:
The k-means algorithm is used to cluster data points into different groups.
The positions of the initial cluster centres(means) can greatly affect the clustering outcome.
2. Local Minima Issue:
The algorithm can get stuck in local minima, leading to different and sometimes unexpected clustering
results based on where we start.
3. Choosing the Number of Clusters:
The k-means algorithm struggles when we do not know the optimal number of clusters in advance.
4. Improving Results with Multiple Runs:
To overcome the issues of local minima and cluster count, we can run the k-means algorithm multiple
times.
By starting with different initial positions for the cluster centres, we increase the chances of finding
a good solution.
The best solution is typically the one that minimizes the overall sum-of-squares error.
5. Evaluating Different Values of k:
We should experiment with various values of k to determine which one yields the best clustering
result.
Caution is needed when measuring the sum-of-squares error:
– If k is set equal to the number of data points, each point can become its own cluster.
– This would yield a sum-of-squares error of zero, but the solution would be overly complex and
specific to the data—this is known as overfitting.
6. Using a Validation Set:
To avoid overfitting, we can calculate the error using a separate validation set.
By multiplying the error by k, we can assess the benefits of adding extra cluster centres and make
more informed decisions about the optimal number of clusters.

3
4 Dealing with Noise
Clustering is often used to handle noisy data. Noisy data can occur for several reasons; it might be slightly
corrupted or even completely wrong. If we can group the data points into the right clusters, we can effectively
reduce the noise. This is because we can replace each noisy data point with the average of the points in its
cluster.
However, the k-means algorithm uses the average (mean) to find the cluster centers, which can be greatly
affected by outliers—these are extreme values that don’t fit with the rest of the data. For example, if we have
the numbers (1, 2, 1, 2, 100), the average is 21.2, but the middle value, known as the median, is 2. The median
is a robust statistic because it is not influenced by outliers.
To improve our clustering method, we can replace the average with the median when calculating the cluster
centers. This change makes the algorithm more resistant to noise. However, calculating the median can take
more time than finding the average, but it helps in effectively reducing noise in the data.

5 The k-Means Neural Network

The k-means algorithm is effective despite challenges like handling noise and deciding the number of clusters.
Interestingly, it has a connection to neural networks. If we think of the cluster centers we optimize as locations
in weight space, we can position neurons at these points and apply neural network training. In the k-means

algorithm, each input identifies the closest cluster center by calculating the distance to all centers. We can
mimic this in a neural network. The position of each neuron corresponds to its weights in weight space. For
each input, we calculate the distance between the input and each neuron’s weight. During training, we adjust
the position of the neuron by changing its weights.
To implement the k-means algorithm using neurons, we will use one layer of neurons and some input nodes,
without a bias node. The first layer will consist of the input nodes that don’t perform any computations, while
the second layer will contain competitive neurons. Only one neuron will “fire” for each input, meaning that
only one cluster center can represent a particular input vector. The neuron that is closest to the input will be
chosen to fire. This approach is called winner-takes-all activation, which is a form of competitive learning. In
competitive learning, neurons compete to fire based on their closeness to the input. Each neuron may learn
to recognize a specific feature, firing only when it detects that feature. For example, you could have a neuron
trained to recognize your grandmother.
We will select k neurons and connect the inputs to the neurons fully, as usual. The activation of the neurons
will be calculated using a linear transfer function as follows:
X
hi = wij xj .
j

When the inputs are normalized to have the same scale, this effectively measures the distance between the
input vector and the cluster center represented by that neuron. Higher activation values indicate that the input
and the cluster center are closer together.
The winning neuron is the one closest to the current input. The challenge is updating the position of that
neuron in weight space, or how to adjust its weights. In the original k-means algorithm, this was straightforward;
we simply set the cluster center to be the mean of all data points assigned to that center. However, in neural
network training, we often input just one vector at a time and change the weights accordingly (using online
learning rather than batch learning). This means we don’t know the mean because we only have information
about the current input.

4
To address this, we approximate the mean by moving the winning neuron closer to the current input. This
makes it more likely to be the best match for the next occurrence of that input. The adjustment is given by:

∆wij = ηxj .
However, this method might not be sufficient. We will explore why that is when we discuss normalization
further.

6 Normalization and Weight Updates

When working with neural networks, it’s essential to ensure that the weights of all neurons are on the same
scale. For example, if most neuron weights are small (less than 1) but one neuron’s weights are much larger
(let’s say 10), this can create problems.
Imagine we have an input vector with values (0.2, 0.2, −0.1). If this input perfectly matches a neuron with
smaller weights, its activation (the output value) will be calculated using a specific formula. However, for the
neuron with the large weights, the activation will be much higher due to those larger weights. This means the
neuron with larger weights may be incorrectly chosen as the best match, even if it’s not the most appropriate
one.
To compare activations fairly among different neurons, we need to normalize the weights. Normalization
ensures that all neurons are positioned on what is known as a unit hypersphere. This means that the distance
of each neuron from the origin (the point where all values are zero) is set to one.
The unit hypersphere is similar to a circle in two dimensions and a sphere in three dimensions, allowing us
to accurately compare the activation values of different neurons.
After normalizing, the activation of a neuron can be expressed in terms of its weights and the input vector.
This expression helps us find the similarity between the input and the neuron weights.

7 Neuronal Activation
The neuronal activation can be expressed as:

hi = WiT · x,
where · denotes the inner product (or scalar product) between two vectors, and WiT is the transpose of the
i-th row of the weight matrix W .
The inner product calculates the product ∥Wi ∥∥x∥ cos θ, where θ is the angle between the two vectors, and
∥ · ∥ represents the magnitude of the vector. If the magnitudes of all vectors are normalized to one, then only
the angle θ influences the size of the dot product.
This means that the closer the two vectors point in the same direction, the larger the activation will be.
Therefore, the angle between the vectors is crucial in determining how effectively they match each other.

8 A Better Weight Update Rule

The weight update rule previously mentioned allows weights to grow indefinitely, which can lead them to move
away from the unit hypersphere. To address this, we can normalize the inputs and use the following weight
update rule:

∆wij = η(xj − wij ),

where this rule moves the weight wij directly toward the current input. It’s important to note that only the
weights of the winning unit (the neuron that best matches the input) are updated.
For example, in our training process, we determine the activation for each neuron, find the winning neuron,
and then update its weights based on the current input.
In many supervised learning algorithms, we minimize the sum of squares difference between the output
and the target, affecting all weights together. However, in our current approach, each weight is minimized
independently. This change makes the analysis of the algorithm’s behavior more complex, which is a common
challenge for competitive learning algorithms. Despite this complexity, competitive learning algorithms tend to
perform well.
Now that we have a better weight update rule, we can look into the overall algorithm for the online k-means
network.
The important point is that we only update the weights of the neuron that is the closest match to the
input. In this method, unlike in supervised learning where we minimize the overall error across all weights, we

5
minimize the error for each weight independently. This makes the analysis of the algorithm more complex, but
competitive learning algorithms like this tend to work well in practice.
Key Points:

Normalization: Ensures that all weights are on the same scale.

Unit Hypersphere: Helps in comparing activations fairly.

Weight Update Rule: Moves weights closer to inputs based on their distance.

Independence: Each weight is updated independently, making the analysis complex but effective.

9 The On-Line k-Means Algorithm

Initialization

– Choose a value for k, which represents the number of output nodes.

– Initialize the weights with small random values.
Learning

– Normalize the data so that all points lie on the unit sphere.
– Repeat the following steps:
* For each datapoint:
· Compute the activations of all the nodes.
· Pick the winner as the node with the highest activation.
· Update the weights using the weight update rule.
– Continue until the number of iterations exceeds a predefined threshold.

Usage

– For each test point:

* Compute the activations of all the nodes.
* Pick the winner as the node with the highest activation.

10 Implementation of K-means with Iris Dataset

Now that we have a method to train the k-means algorithm, we can learn about data. However, we need to
consider how to interpret the results. If the data doesn’t have labels, it can be challenging to analyze the results
since we have nothing to compare them with.
We can apply unsupervised learning methods to cluster data where we know some of the labels. For instance,
we can use the k-means algorithm on the Iris dataset we studied earlier. This dataset classifies three types of
iris flowers using the MLP.
To use the algorithm, we only need to provide some of the data for training and then test it with additional
data. However, the output of the k-means algorithm is not straightforward since we do not utilize the labels
from the dataset in this unsupervised learning context.
To address this, we need a way to convert the algorithm’s results, which indicate the index of the best-
matching cluster, into classification outputs that we can compare with the actual labels. This conversion is
relatively simple if we use three clusters in the algorithm, as there should ideally be a one-to-one correspondence
between them. However, using more clusters might yield better results, albeit complicating the analysis.
If the number of data points is small, you can do this by hand. Alternatively, you can use a supervised
learning algorithm to automate this process, which we will discuss next.
To illustrate how the k-means algorithm is applied, let’s explore its use on the Iris dataset:

6
11 Key Points on Competitive Learning for Clustering
Cluster Assignment: After training, a new datapoint can be classified by presenting it to the trained
algorithm, which determines the activated cluster.
Class Label Interpretation: When target data is available, the best-matching cluster can be interpreted
as a class label. However, careful consideration is needed since the order of nodes may not align with the
order of target classes.
Matching Output Classes: To ensure accurate classification, it is essential to match output classes to
target data carefully. Misalignment can lead to misleading results.
Combining K-Means and Perceptrons: The k-means algorithm can position Radial Basis Function
(RBF) nodes in the input space, which accurately represents the input data. A Perceptron can be trained
on top of these RBFs to match output classes to the target data categories.
Utilizing More Clusters: Using more clusters in the k-means network allows for better representation
of the input data without needing to manually assign datapoints to clusters since the Perceptron handles
this classification.

7
kmeans

October 7, 2024

[7]: import numpy as np

[8]: import matplotlib.pyplot as plt

[9]: from sklearn.cluster import KMeans

[10]: from sklearn.datasets import load_iris

[11]: iris = load_iris()

[12]: X = iris.data[:, :2] # Use only the first two features for 2D visualization
y = iris.target # True labels for comparison

[13]: # Plotting the original data

plt.figure(figsize=(12, 5))

# Original data
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolor='k',␣
↪s=100)

plt.title('Original Iris Dataset')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid()

1
[14]: kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

[14]: KMeans(n_clusters=3, random_state=42)

[18]: predictions = kmeans.predict(X)

[19]: # Plotting the k-means result

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=predictions, cmap='viridis', marker='o',␣
↪edgecolor='k', s=100)

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],␣

↪c='red', marker='X', s=200, label='Centroids')

plt.title('K-Means Clustering Result')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.grid()

2
plt.tight_layout()
plt.show()

som-new
No ratings yet
som-new
21 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Week 9
No ratings yet
Week 9
66 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
k Means Presentation
No ratings yet
k Means Presentation
69 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unsupervised Learning (1)
No ratings yet
Unsupervised Learning (1)
27 pages
Lecture Unsupervised (17!04!2024).Pptx
No ratings yet
Lecture Unsupervised (17!04!2024).Pptx
61 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Unit IV
No ratings yet
Unit IV
96 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
10.Lab Activity
No ratings yet
10.Lab Activity
11 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
MACHINE LEARNING - IV
No ratings yet
MACHINE LEARNING - IV
13 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Unsupervised - Learning Final
No ratings yet
Unsupervised - Learning Final
20 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Week 11
No ratings yet
Week 11
49 pages
DATA 2024_dist
No ratings yet
DATA 2024_dist
97 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
Unsupervised ML Clustering
No ratings yet
Unsupervised ML Clustering
15 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Unsupervised Lec
No ratings yet
Unsupervised Lec
12 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Week 10
No ratings yet
Week 10
41 pages
Clustering
No ratings yet
Clustering
24 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Big Data
No ratings yet
Big Data
21 pages
K means Clustering
No ratings yet
K means Clustering
11 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Kmean
No ratings yet
Kmean
24 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Syntax_complete
No ratings yet
Syntax_complete
22 pages
UNIT-3_SEMANTICS MATERIAL
No ratings yet
UNIT-3_SEMANTICS MATERIAL
16 pages
Copper Heap Leach Sx-Ew Slideshow
No ratings yet
Copper Heap Leach Sx-Ew Slideshow
11 pages
Lionel Martin F MG
No ratings yet
Lionel Martin F MG
3 pages
Gr5205 Midterm Key
No ratings yet
Gr5205 Midterm Key
13 pages
No 10 Chickpdfpattern
100% (1)
No 10 Chickpdfpattern
7 pages
CH 5 Heat Exchanger Design Methods
100% (1)
CH 5 Heat Exchanger Design Methods
30 pages
Yinkai Xu - Rotational Motion FRQ Test Per 6 ET - 6397871
No ratings yet
Yinkai Xu - Rotational Motion FRQ Test Per 6 ET - 6397871
6 pages
Verotop P, 34mm, Galv, 1960, LHLL - 240501 - 123228
No ratings yet
Verotop P, 34mm, Galv, 1960, LHLL - 240501 - 123228
2 pages
Liye - Info Tree Planting Project Proposal PR PDF
100% (1)
Liye - Info Tree Planting Project Proposal PR PDF
4 pages
Brochure Final PDF
No ratings yet
Brochure Final PDF
20 pages
Bimal CV
No ratings yet
Bimal CV
5 pages
DOC994 DualSwitch Valve Sales Brochure
No ratings yet
DOC994 DualSwitch Valve Sales Brochure
6 pages
486 - Assignment 3 Frontsheet
No ratings yet
486 - Assignment 3 Frontsheet
30 pages
Host R9
No ratings yet
Host R9
2 pages
Nutrition Module Midterm
No ratings yet
Nutrition Module Midterm
23 pages
Adobe Photoshop CC for Photographers 2018 1st Edition - PDF Versionpdf download
100% (4)
Adobe Photoshop CC for Photographers 2018 1st Edition - PDF Versionpdf download
57 pages
Productattachments Files M T mt7681 Data Sheet v0 0 PDF
No ratings yet
Productattachments Files M T mt7681 Data Sheet v0 0 PDF
14 pages
Exposure To Sugar Rationing in The First 1000 Days of Life Protected Against Chronic Disease
No ratings yet
Exposure To Sugar Rationing in The First 1000 Days of Life Protected Against Chronic Disease
42 pages
Waste Water Report
No ratings yet
Waste Water Report
27 pages
European Exploration - Britannica Online Encyclopedia
100% (1)
European Exploration - Britannica Online Encyclopedia
25 pages
Port Details Report
No ratings yet
Port Details Report
1 page
Hoang Ngoc Hai Topcv - VN 230918.194444
No ratings yet
Hoang Ngoc Hai Topcv - VN 230918.194444
2 pages
Acute Myeloid Leukemia
No ratings yet
Acute Myeloid Leukemia
14 pages
Forecasting of area, productivity and prices of mango in Navsari district, Gujarat
No ratings yet
Forecasting of area, productivity and prices of mango in Navsari district, Gujarat
12 pages
Unit 2-How The World Works
No ratings yet
Unit 2-How The World Works
5 pages
Analysis of Total Rna Using The Agilent 2100 Bioanalyzer and The Rna 6000 Labchip® Kit
No ratings yet
Analysis of Total Rna Using The Agilent 2100 Bioanalyzer and The Rna 6000 Labchip® Kit
8 pages
Tosin Technical Report
No ratings yet
Tosin Technical Report
27 pages
Tektronix 5a21n Op Service Manual-001-006
No ratings yet
Tektronix 5a21n Op Service Manual-001-006
6 pages
C.V - Rashid Ali Memon
No ratings yet
C.V - Rashid Ali Memon
7 pages
Hazard Kimia
No ratings yet
Hazard Kimia
24 pages
John Leary
No ratings yet
John Leary
118 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K-MEANS-FINAL

Uploaded by

K-MEANS-FINAL

Uploaded by

Unit-4 UNSUPERVISED LEARNING

Dr. John Babu

1 Supervised Learning vs Unsupervised Learning

 Input (Features): Number of bedrooms, square footage, location

 Output (Label): House price

Common algorithms in supervised learning include:

 Support vector machines (SVM)

1.2 Unsupervised Learning

 Input (Features): Age, purchasing history, browsing time

Common algorithms in unsupervised learning include:

 Principal component analysis (PCA)

Supervised Learning Unsupervised Learning

Table 1: Comparison between Supervised and Unsupervised Learning

2.1 K-Means Algorithm

3 Issues in the k-Means Algorithm

5 The k-Means Neural Network

6 Normalization and Weight Updates

8 A Better Weight Update Rule

∆wij = η(xj − wij ),

Normalization: Ensures that all weights are on the same scale.

Unit Hypersphere: Helps in comparing activations fairly.

9 The On-Line k-Means Algorithm

– Choose a value for k, which represents the number of output nodes.

– For each test point:

10 Implementation of K-means with Iris Dataset

[7]: import numpy as np

[8]: import matplotlib.pyplot as plt

[9]: from sklearn.cluster import KMeans

[10]: from sklearn.datasets import load_iris

[11]: iris = load_iris()

[13]: # Plotting the original data

plt.title('Original Iris Dataset')

[14]: KMeans(n_clusters=3, random_state=42)

[18]: predictions = kmeans.predict(X)

[19]: # Plotting the k-means result

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],␣

plt.title('K-Means Clustering Result')

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Input (Features): Number of bedrooms, square footage, location

Output (Label): House price

Support vector machines (SVM)

Input (Features): Age, purchasing history, browsing time

Principal component analysis (PCA)