0% found this document useful (0 votes)

4 views

Chapter 1

The document provides an overview of unsupervised learning, focusing on clustering analysis in Python. It explains the concepts of clustering, various algorithms such as K-means and hierarchical clustering, and the importance of data normalization for effective clustering. Additionally, it includes practical examples and code snippets for implementing clustering techniques using Python libraries.

Uploaded by

簡維萱

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Chapter 1

Uploaded by

簡維萱

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Unsupervised

learning: basics
C L U S TE R AN ALYS I S I N P YTH ON

Shaumik Daityari
Business Analyst
Everyday example: Google news
How does Google News classify articles?

Unsupervised Learning Algorithm:

Clustering

Match frequent terms in articles to nd

similarity

CLUSTER ANALYSIS IN PYTHON

Labeled and unlabeled data
Data with no labels Point 1: (1, 2)

Point 2: (2, 2)

Point 3: (3, 1)

Data with labels Point 1: (1, 2), Label: Danger Zone

Point 2: (2, 2), Label: Normal Zone

Point 3: (3, 1), Label: Normal Zone

CLUSTER ANALYSIS IN PYTHON

What is unsupervised learning?
A group of machine learning algorithms that nd pa erns in data

Data for algorithms has not been labeled, classi ed or characterized

The objective of the algorithm is to interpret any structure in the data

Common unsupervised learning algorithms: clustering, neural networks, anomaly detection

CLUSTER ANALYSIS IN PYTHON

What is clustering?
The process of grouping items with similar characteristics

Items in groups similar to each other than in other groups

Example: distance between points on a 2D plane

CLUSTER ANALYSIS IN PYTHON

Plotting data for clustering - Pokemon sightings
from matplotlib import pyplot as plt

x_coordinates = [80, 93, 86, 98, 86, 9, 15, 3, 10, 20, 44, 56, 49, 62, 44]
y_coordinates = [87, 96, 95, 92, 92, 57, 49, 47, 59, 55, 25, 2, 10, 24, 10]

plt.scatter(x_coordinates, y_coordinates)
plt.show()

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
Up next - some
practice
C L U S TE R AN ALYS I S I N P YTH ON
Basics of cluster
analysis
C L U S TE R AN ALYS I S I N P YTH ON

Shaumik Daityari
Business Analyst
What is a cluster?
A group of items with similar
characteristics

Google News: articles where similar words

and word associations appear together

Customer Segments

CLUSTER ANALYSIS IN PYTHON

Clustering algorithms
Hierarchical clustering

K means clustering

Other clustering algorithms: DBSCAN, Gaussian Methods

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
Hierarchical clustering in SciPy
from scipy.cluster.hierarchy import linkage, fcluster
from matplotlib import pyplot as plt
import seaborn as sns, pandas as pd

x_coordinates = [80.1, 93.1, 86.6, 98.5, 86.4, 9.5, 15.2, 3.4,

10.4, 20.3, 44.2, 56.8, 49.2, 62.5, 44.0]
y_coordinates = [87.2, 96.1, 95.6, 92.4, 92.4, 57.7, 49.4,
47.3, 59.1, 55.5, 25.6, 2.1, 10.9, 24.1, 10.3]

df = pd.DataFrame({'x_coordinate': x_coordinates,
'y_coordinate': y_coordinates})

Z = linkage(df, 'ward')
df['cluster_labels'] = fcluster(Z, 3, criterion='maxclust')

sns.scatterplot(x='x_coordinate', y='y_coordinate',
hue='cluster_labels', data = df)
plt.show()

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
CLUSTER ANALYSIS IN PYTHON
K-means clustering in SciPy
from scipy.cluster.vq import kmeans, vq
from matplotlib import pyplot as plt
import seaborn as sns, pandas as pd

import random
random.seed((1000,2000))

x_coordinates = [80.1, 93.1, 86.6, 98.5, 86.4, 9.5, 15.2, 3.4,

10.4, 20.3, 44.2, 56.8, 49.2, 62.5, 44.0]
y_coordinates = [87.2, 96.1, 95.6, 92.4, 92.4, 57.7, 49.4,
47.3, 59.1, 55.5, 25.6, 2.1, 10.9, 24.1, 10.3]

df = pd.DataFrame({'x_coordinate': x_coordinates, 'y_coordinate': y_coordinates})

centroids,_ = kmeans(df, 3)
df['cluster_labels'], _ = vq(df, centroids)

sns.scatterplot(x='x_coordinate', y='y_coordinate',
hue='cluster_labels', data = df)
plt.show()

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON
Next up: hands-on
exercises
C L U S TE R AN ALYS I S I N P YTH ON
Data preparation
for cluster analysis
C L U S TE R AN ALYS I S I N P YTH ON

Shaumik Daityari
Business Analyst
Why do we need to prepare data for clustering?
Variables have incomparable units (product dimensions in cm, price in $)

Variables with same units have vastly di erent scales and variances (expenditures on
cereals, travel)

Data in raw form may lead to bias in clustering

Clusters may be heavily dependent on one variable

Solution: normalization of individual variables

CLUSTER ANALYSIS IN PYTHON

Normalization of data
Normalization: process of rescaling data to a standard deviation of 1

x_new = x / std_dev(x)

from scipy.cluster.vq import whiten

data = [5, 1, 3, 3, 2, 3, 3, 8, 1, 2, 2, 3, 5]

scaled_data = whiten(data)
print(scaled_data)

[2.73, 0.55, 1.64, 1.64, 1.09, 1.64, 1.64, 4.36, 0.55, 1.09, 1.09, 1.64, 2.73]

CLUSTER ANALYSIS IN PYTHON

Illustration: normalization of data
# Import plotting library
from matplotlib import pyplot as plt

# Initialize original, scaled data

plt.plot(data,
label="original")
plt.plot(scaled_data,
label="scaled")

# Show legend and display plot

plt.legend()
plt.show()

CLUSTER ANALYSIS IN PYTHON

Next up: some DIY
exercises
C L U S TE R AN ALYS I S I N P YTH ON

Assimil Portugais Bresil Portuguese French PDF
0% (1)
Assimil Portugais Bresil Portuguese French PDF
3 pages
MBA SAR Template - 2023
No ratings yet
MBA SAR Template - 2023
27 pages
Myp Semester 2
No ratings yet
Myp Semester 2
12 pages
Psycho Motor in Mathematics Learning
No ratings yet
Psycho Motor in Mathematics Learning
10 pages
Cluster Analysis in Python Chapter1 PDF
No ratings yet
Cluster Analysis in Python Chapter1 PDF
31 pages
Cluster Analysis in Python Chapter4 PDF
No ratings yet
Cluster Analysis in Python Chapter4 PDF
30 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Clustering_Course_Slides
No ratings yet
Clustering_Course_Slides
26 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Assignment 3.1 K Means Clustering in Python PART 1
No ratings yet
Assignment 3.1 K Means Clustering in Python PART 1
7 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Complete Clustering
No ratings yet
Complete Clustering
80 pages
Cluster Lecture-1
No ratings yet
Cluster Lecture-1
20 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
SE_KMeansClustering
No ratings yet
SE_KMeansClustering
21 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
mod3 dm
No ratings yet
mod3 dm
20 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Introduction to Clustering
No ratings yet
Introduction to Clustering
3 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
DOC-20250407-WA0033.
No ratings yet
DOC-20250407-WA0033.
38 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Data Clustering: A Review
No ratings yet
Data Clustering: A Review
60 pages
Clustering
No ratings yet
Clustering
34 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DLP English 9
No ratings yet
DLP English 9
22 pages
Gurleen Kaur Start
No ratings yet
Gurleen Kaur Start
5 pages
sp19 Lesson Plan Social Studies Producers and Consumers
No ratings yet
sp19 Lesson Plan Social Studies Producers and Consumers
5 pages
Map Skills Lesson Plan 5
No ratings yet
Map Skills Lesson Plan 5
6 pages
Encounters with Children Pediatric Behavior and Development 4th Edition Suzanne Dixon - Read the ebook online or download it to own the complete version
100% (1)
Encounters with Children Pediatric Behavior and Development 4th Edition Suzanne Dixon - Read the ebook online or download it to own the complete version
55 pages
CBLM Food and Beverage
No ratings yet
CBLM Food and Beverage
37 pages
What Is Higher Order Thinking Skills HOTS
No ratings yet
What Is Higher Order Thinking Skills HOTS
5 pages
Books Vs Computers in Schools
No ratings yet
Books Vs Computers in Schools
5 pages
E-Learning 101:: Validation in A Tough Environment
No ratings yet
E-Learning 101:: Validation in A Tough Environment
35 pages
Igniting Your Spark
No ratings yet
Igniting Your Spark
3 pages
Approaches To Curriculum Design: Tomasa C. Iringan, PH.D
100% (1)
Approaches To Curriculum Design: Tomasa C. Iringan, PH.D
16 pages
Lesson Plan - MCC
No ratings yet
Lesson Plan - MCC
7 pages
Lesson 8 Jobs-And-Routines
100% (2)
Lesson 8 Jobs-And-Routines
4 pages
Media and Information Literacy
No ratings yet
Media and Information Literacy
2 pages
Blue Print
No ratings yet
Blue Print
22 pages
Introduction To Jubilee
No ratings yet
Introduction To Jubilee
117 pages
Keto Accelerator Masterclass: Module - Introduction To The Ketogenic Diet
No ratings yet
Keto Accelerator Masterclass: Module - Introduction To The Ketogenic Diet
4 pages
Eleot Powerful Practices
No ratings yet
Eleot Powerful Practices
15 pages
Poetry Writing Unit
No ratings yet
Poetry Writing Unit
24 pages
Eld Lesson Plan Roxaboxen
No ratings yet
Eld Lesson Plan Roxaboxen
5 pages
Literature Review On Educational Facilities
100% (1)
Literature Review On Educational Facilities
7 pages
Data Privacy Consent Form2020 PDF
No ratings yet
Data Privacy Consent Form2020 PDF
2 pages
Activity Completion Report (ACR)
No ratings yet
Activity Completion Report (ACR)
4 pages
ĐỀ THAM KHẢO 15 (BỐ SUNG SỐ THỨ TỰ CÂU, CHỈNH SỬA CÂU 17_ 18).docx
No ratings yet
ĐỀ THAM KHẢO 15 (BỐ SUNG SỐ THỨ TỰ CÂU, CHỈNH SỬA CÂU 17_ 18).docx
4 pages
1 Year Executive MBA Program Online India From OPJ Jindal University
No ratings yet
1 Year Executive MBA Program Online India From OPJ Jindal University
1 page
Creating and Using Rubrics
No ratings yet
Creating and Using Rubrics
43 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Unsupervised

Unsupervised Learning Algorithm:

Match frequent terms in articles to nd

CLUSTER ANALYSIS IN PYTHON

Data with labels Point 1: (1, 2), Label: Danger Zone

Point 2: (2, 2), Label: Normal Zone

Point 3: (3, 1), Label: Normal Zone

CLUSTER ANALYSIS IN PYTHON

Data for algorithms has not been labeled, classi ed or characterized

The objective of the algorithm is to interpret any structure in the data

Common unsupervised learning algorithms: clustering, neural networks, anomaly detection

CLUSTER ANALYSIS IN PYTHON

Items in groups similar to each other than in other groups

Example: distance between points on a 2D plane

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON

Google News: articles where similar words

CLUSTER ANALYSIS IN PYTHON

Other clustering algorithms: DBSCAN, Gaussian Methods

CLUSTER ANALYSIS IN PYTHON

x_coordinates = [80.1, 93.1, 86.6, 98.5, 86.4, 9.5, 15.2, 3.4,

CLUSTER ANALYSIS IN PYTHON

x_coordinates = [80.1, 93.1, 86.6, 98.5, 86.4, 9.5, 15.2, 3.4,

df = pd.DataFrame({'x_coordinate': x_coordinates, 'y_coordinate': y_coordinates})

CLUSTER ANALYSIS IN PYTHON

Data in raw form may lead to bias in clustering

Clusters may be heavily dependent on one variable

Solution: normalization of individual variables

CLUSTER ANALYSIS IN PYTHON

from scipy.cluster.vq import whiten

CLUSTER ANALYSIS IN PYTHON

# Initialize original, scaled data

# Show legend and display plot

CLUSTER ANALYSIS IN PYTHON

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.