Lecture 4

1
2
Wollo University ,Kombolicha Institute of Technology
Department of Software Engineering
Fundamental of Machine Learning
By Ashenafi Workie(MSc.)
KIOT@SE by Ashenafi Workie
Major chapters outlines
1 Chapter 1: Introduction to Machine Learning

2 Chapter 2: Classification based Supervised Learning
3 Chapter 3: Regression based Supervised Learning
4 Chapter 4: Unsupervised Learning
5 Chapter 5: Reinforcement Learning
6 Chapter 6: Advanced Machine Learning
4
Assessment Methods
• Assignments and Lab [15%]

• Quiz and Test [20%]
• Project and presentation [20%]
• Final Exam [45%]
5
Unsupervised Learning
6
Clustering in Machine Learning
▪ Clustering: is the assignment of a set of observations into subsets (clusters) so that

observations in the same cluster are similar in some sense.
▪ Clustering is a method of unsupervised learning.

▪ Unsupervised: data points have unknown outcome.
▪ Supervised: data points have known outcome.
▪ K-means clustering is an algorithm to classify or to group objects based on
attributes/features into K number of group.
▪ K is positive integer number.
7
K-means Clustering
▪ We know before hand that these objects belong to two groups (k=2) of
medicine (Cluster 1 and cluster 2).
▪ The problem now is to determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
8
K-means Clustering
▪ The basic step of K-means clustering is simple.

▪ In the beginning we determine number of cluster K and we assume the centroid or
center of these clusters.
▪ We can take any random objects as the initial centroid or the first K objects in
sequence can also serve as the initial centroids.
▪ The K means algorithm will do the follow four steps until convergence.
9
K-means Clustering
▪ Step 1: Begin with a decision on the values of k = number of cluster.
▪ Step 2: put any initial partition that classifies the data into k clusters.
▪ You may assign the training sample randomly, or systematically as the
following:
▪ Take the first k training sample as single-element clusters.
▪ Assign each of the remaining (N-k) training sample to the cluster with the
nearest centroid. After each assignment, recomputed the centroid of the
gaining cluster.
10
K-means Clustering
▪ Step 3: Take each sample in sequence and compute its distance from the
centroid of each of the cluster.
▪ If a sample is not currently in the cluster with the closest centroid, switch this
sample to that cluster and update the centroid of the cluster gaining the new
sample and the cluster losing the sample.
▪ Step 4: Repeat step 3 until convergence is achieved, that is until a pass through
the training sample causes no new assignments.
11
K-means Clustering
12
K-means Clustering
▪ K = 2 (find two clusters)
13
K-means Clustering
▪ K = 2 (randomly assign cluster centers)
14
K-means Clustering
▪ K = 2 (each point belongs to closest center)
15
K-means Clustering
▪ K = 2 (move each center to cluster’s mean)
16
K-means Clustering
▪ K = 2 (each point belongs to closest center)
17
K-means Clustering
▪ K = 2 (move each center to cluster’s mean)

▪
18
K-means Clustering
▪ K = 2 (points don’t change - converged)
▪
19
K-means Clustering(Examples)
▪ If the number of data is less than the number of cluster then we assign each
data as the centroid of the cluster. Each centroid will have a cluster number.
▪ If the number of data is bigger than the number of cluster, for each data, we
calculate the distance to all centroid and get the minimum distance.
▪ This data is said belong to the cluster that has minimum distance from this
data.
20
▪ Suppose we have several objects (4 types of medicines) and each object
have two attributes or features as shown in the table.
▪ Our goal is to group these objects into K=2 group of medicine based on the
two features (pH and weight index).
21
▪ Each medicine represents one

point with two attributes (X, Y)
that we can represent it as
coordinate in an attribute space as
shown in the figure.
22
▪ 1. Initial value of centroids:

suppose we use medicine A
and medicine B as the first
centroids.
▪ Let and denote the

coordinate of the centroids,
then C1 = (1,1) and C2 = (2,1)
23
▪ 2. Objects-Centroids distance: Lets calculate the distance
between cluster centroid to each objects.
▪ Let us use Euclidean distance, then we have distance matrix at
iteration 0 is:
Medicine C(4,3)
C1=(1,1)
C2=(2,1)
24
▪ Each column in the distance matrix symbolizes the object.

▪ The first row of the distance matrix corresponds to the distance of each object
to the first centroid and the second row is the distance of each object to the
second centroid.
▪ For example, distance from medicine C = (4, 3) to the first centroid C1 = (1, 1)
is
▪ And its distance to the second centroid C2 = is
25
▪ 3. Objects clustering: Assign each object based on the minimum distance.
▪ Thus, medicine A is assigned to group 1, medicine B to group 2, medicine C to
group 2 and medicine D to group 2.
▪ The element of Group matrix below is 1 if and only if the objects is assigned to
that group.
26
▪ 4. Iteration-1, determine centroids: Knowing the members of each group, now

compute the new centroid of each group based on these new memberships.
▪ Group 1 only has one member.
▪ Thus the centroid remains in C1 = (1,1).
▪ Group 2 has three members, thus the centroid is the average coordinate among the
three members:
27
▪ 5. Iteration-1, Objects-Centroids distances:

The next step is to compute the distance of
all objects to the new centroids.
▪ Similar to step 2, we have distance matrix
at iteration 1 is:
28
▪ 6. Iteration-1, Objects clustering: Similar to step 3, we assign each object based on
the minimum distance.
▪ Based on the new distance matrix, we move the medicine B to Group 1 while all
the other objects remain.
▪ The Group matrix is shown below:
29
▪ 7. Iteration-2, determine centroids: Now we repeat step 4 to

calculate the new centroids coordinator based on the clustering of
previous iteration.
▪ Group 1 and Group 2 both has two members, this the new centroids
are C1 =
▪ And C2 =
30
▪ 8. Iteration-2, Objects-Centroids
distances: Repeat step 2 again, we have
new distance matrix at iteration 2 as:
31
▪ 9. Iteration-2, Objects clustering: Again, we assign each object
based on the minimum distance.
32
▪ We obtain result that G2 = G1.
▪ Comparing the grouping of last iteration and this iteration reveals that the
objects does not move group anymore.
▪ Thus, the computation of the K-mean clustering has reached its stability and
no more iteration is needed.
▪ We get the final grouping as the results.
33
34
▪ Which model is the right one?

▪ Inertia: sum of squared distance from each point (xi) to its cluster
(ck) .
▪ Smaller value corresponds to tighter cluster.

▪ Other metrics can be used.
35
K-means Clustering(Syntax)
▪ Import the class containing the clustering method.
▪ From sklearn.cluster import KMeans
▪ Create an instance of the class.
▪ kmeans = kmeans( n_clusters = 3, init=‘k-means++’)
▪ Fit the instance on the data and then predict clusters for new data.
▪ kmeans = kmeans.predict(x1)
▪ y_predict = kmeans.predict(x2)
▪ Can also be used in batch mode with MiniBatchkMeans.
36
Distance Metrics(Examples)
▪ Distance metric choice:

▪ Choice of distance metric is extremely import to clustering success.
▪ Each metric has strengths and most appropriate use-cases.
▪ But sometimes choice of distance metric is also based on empirical evaluation.
37
▪ Euclidian distance:
38
▪ Manhattan distance:
39
▪ Cosine distance:
40
Euclidian vs Cosine distance
▪ Euclidean is useful for coordinate based measurements.
▪ Cosine is better for data such as text where location of occurrence is less
important.
▪ Euclidean distance is more sensitive to curse of dimensionality.
41
▪ Jaccard distance:
▪ Applies to sets (like word occurrence)
▪ Sentence A: “I like chocolate ice cream.”
▪ Set A = {I, like, chocolate, ice, cream}
▪ Sentence B: “Do I want chocolate cream or vanilla cream?”
▪ Set B = {Do, I, want, chocolate, cream, or, vanilla}
42
▪ Jaccard distance:
▪ Applies to sets (like word occurrence)
▪ Sentence A: “I like chocolate ice cream.”
▪ Set A = {I, like, chocolate, ice, cream}
▪ Sentence B: “Do I want chocolate cream or vanilla cream?”
▪ Set B = {Do, I, want, chocolate, cream, or, vanilla}
43
Distance metrics (Syntax's)
▪ Import the general pairwise distance function.

▪ From sklearn.metrics import pairwise_distances
▪ Calculate the distance.
▪ dist = pairwise_distances( X, Y, metric = ‘euclidean’)
▪ Other distance metric choices are: cosine, manhattan, jaccard, etc.

▪ Distance metric methods can also be imported specifically, e.g.:
▪ From sklearn.metrics import euclidean_distances
44
Other types of Clustering techniques
▪ Other types of clustering:
▪ Mini-Batch K-Means
▪ Affinity Propagation
▪ Mean Shift
▪ Spectral Clustering
▪ Ward
▪ DBSCAN
▪ Hierarchical clustering etc…
45
Association Analysis with the Apriori
Algorithm
46
Mining Association Rules
▪ The goal of association rule finding is to extract correlation
relationships in the large datasets of items.
▪ An illustrative example of association rule mining is so-called
market basket analysis.
47
Mining Association Rules
▪ What could be a rule and what kind of rules are we looking for?
▪ Example of an association rule could be:
▪ Computer => finacial _managment_software
▪ Which means that a purchase of a computer implies a purchase of financial

_management_software.
▪ Naturally, this rule may not hold for all customers and every single purchase.
▪ Thus, we are going to associate two numbers with every such rule.
▪ These two numbers are called support and confidence.
48
Notation and Basic concepts
▪ They are used as a measure of the interestingness of the rules.
▪ Let Ω = {i1, i2, … im} be a universe of items.
▪ Also, let T = {t1, t2, …tn} be a set of all transactions collected over a given period of
time.
▪ Thus, t ⊆ Ω (“t is a subset of omega”). In reality, each transaction t is assigned a
number, for example a transaction id (TID).
▪ An association rule will be an implication of the form: A => B.

▪ Where both A and B are subsets of Ω and A ∩ B = ∅ (“the intersection of sets A and B
is an empty set”).
49
▪ What is support?
▪ Support (frequency) is simply a probability that a randomly
chosen transaction t contains both items A and B.
50
❖ What is confidence?
❖ Confidence (accuracy) is simply a probability that an itemset B
is purchased in a randomly chosen transaction t given that the
itemset A is purchased.
51
▪ The goal is to find interesting rules.
▪ We would like to detect those with high support and high confidence.
▪ Typically, we will select appropriate thresholds for both measures and then look
for all subsets that fulfill given support and confidence criteria.
▪ A set of k items is called a k-itemset.
▪ For example, {bread, skim milk, books} is a 3- itemset.
▪ An itemset whose count (or probability) is greater than some pre-specified
threshold is called a frequent itemset.
52
▪ How are we going to find interesting rules from the database T?
▪ It will be a two-step process:

▪ Find all frequent itemsets (each of these itemsets will occur at least as frequently as
pre-specified by the minimum support threshold)
▪ Generate strong association rules from the frequent itemsets (these rules will satisfy
both minimum support threshold and minimum confidence threshold)
53
Apriori Algorithms
▪ In the following transaction database D find all frequent itemsets.

▪ The minimum support count is 2.
54
Apriori Algorithms
▪ In the following transaction database D find all frequent itemsets.

▪ The minimum support count is 2.
55
Apriori Algorithms
▪ The steps of the algorithm are shown below:
56
Apriori Algorithms
57
Apriori Algorithms
58
Apriori Algorithms
59
Apriori Algorithms
60
Apriori Algorithms
▪ Apriori algorithm employs a level-wise search for frequent itemsets.
▪ In particular, frequent k-itemsets are used to find frequent (k + 1) itemsets.
▪ This is all based on the following property:

▪ All nonempty subsets of a frequent itemset must also be frequent.
▪ This means that, in order to find Lk + 1, we should only be looking at Lk.
▪ There are two steps in this process, the join step and the prune step.
61
Apriori Algorithms
▪ How can we generate association rules from frequent itemsets?
▪ Once we find all frequent itemsets, we can easily calculate confidence for any rule.
▪ For every frequent itemset, we generate all nonempty proper subsets.

▪ Run each subset (and its complement) through the above formula.
▪ Those rules that create confidence above the pre-specified threshold are
outputted as association rules.
62
Apriori Algorithms
▪ In the example below an itemset l = {I1, I2, I5} is frequent.

▪ All nonempty proper subsets are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2},
{I5}.
▪ The confidence of all the candidate association rules are now:
▪
▪ If a minimum confidence rule was 75%, only the second, third and
sixth rule would be considered strong and thus outputted.
63
End ….
64

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

1

Department of Software Engineering

Fundamental of Machine Learning

1 Chapter 1: Introduction to Machine Learning

• Assignments and Lab [15%]

▪ Clustering: is the assignment of a set of observations into subsets (clusters) so that

▪ Clustering is a method of unsupervised learning.

▪ The basic step of K-means clustering is simple.

▪ Step 1: Begin with a decision on the values of k = number of cluster.

▪ K = 2 (find two clusters)

▪ K = 2 (randomly assign cluster centers)

▪ K = 2 (each point belongs to closest center)

▪ K = 2 (each point belongs to closest center)

▪ K = 2 (move each center to cluster’s mean)

▪ Each medicine represents one

▪ 1. Initial value of centroids:

▪ Let and denote the

▪ Each column in the distance matrix symbolizes the object.

▪ And its distance to the second centroid C2 = is

▪ 4. Iteration-1, determine centroids: Knowing the members of each group, now

▪ 5. Iteration-1, Objects-Centroids distances:

▪ 7. Iteration-2, determine centroids: Now we repeat step 4 to

▪ Which model is the right one?

▪ Smaller value corresponds to tighter cluster.

▪ Distance metric choice:

▪ Euclidean distance is more sensitive to curse of dimensionality.

▪ Import the general pairwise distance function.

▪ Other distance metric choices are: cosine, manhattan, jaccard, etc.

▪ Which means that a purchase of a computer implies a purchase of financial

▪ An association rule will be an implication of the form: A => B.

▪ How are we going to find interesting rules from the database T?

▪ It will be a two-step process:

▪ In the following transaction database D find all frequent itemsets.

▪ In the following transaction database D find all frequent itemsets.

▪ The steps of the algorithm are shown below:

▪ The steps of the algorithm are shown below:

▪ The steps of the algorithm are shown below:

▪ This is all based on the following property:

▪ For every frequent itemset, we generate all nonempty proper subsets.

▪ In the example below an itemset l = {I1, I2, I5} is frequent.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.