0% found this document useful (0 votes)

64 views

5) - Differentiate Between K-Means and Hierarchical Clustering

Hierarchical clustering builds a hierarchy of clusters by successively merging or splitting clusters based on a distance metric. It can be agglomerative, starting with each observation as a separate cluster and merging the closest pairs, or divisive, starting with all observations in one cluster and splitting them recursively. K-means clustering assigns each observation to the cluster with the nearest centroid, and iteratively updates cluster centroids until convergence. Web content mining analyzes unstructured web page content to discover useful information, while web usage mining analyzes server logs to discover patterns in how users navigate and interact with a website.

Uploaded by

Dhananjay Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

5) - Differentiate Between K-Means and Hierarchical Clustering

Uploaded by

Dhananjay Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

5). Differentiate between K-means and Hierarchical clustering Hierarchical clustering Merge-arrow.

svg It has been suggested that this article or section be merged into Hierarchical clustering. (Discuss) Proposed since October 2009. Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure called a dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations. Algorithms for hierarchical clustering are generally either agglomerative, in which one starts at the leaves and successively merges clusters together; or divisive, in which one starts at the root and recursively splits the clusters. Any non-negative-valued function may be used as a measure of similarity between pairs of observations. The choice of which clusters to merge or split is determined by a linkage criterion, which is a function of the pairwise distances between observations. Cutting the tree at a given height will give a clustering at a selected precision. In the following example, cutting after the second row will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger clusters.

Agglomerative hierarchical clustering For example, suppose this data is to be clustered, and the Euclidean distance is the distance metric. This method builds the hierarchy from the individual elements by progressively merging clusters. In our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance. Optionally, one can also construct a distance matrix at this stage, where the number in the i-th row j-th column is the distance between the i-th and j-th elements. Then, as clustering progresses, rows and columns are merged as the clusters are merged and the distances updated. This is a common way to implement this type of clustering, and has the benefit of caching distances between clusters. A simple agglomerative clustering algorithm is described in the single-linkage clustering page; it can easily be adapted to different types of linkage (see below). Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. To do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters and is one of

the following: The maximum distance between elements of each cluster (also called complete-linkage clustering): The minimum distance between elements of each cluster (also called single-linkage clustering): The mean distance between elements of each cluster (also called average linkage clustering, used e.g. in UPGMA): The sum of all intra-cluster variance. The increase in variance for the cluster being merged (Ward's criterion). The probability that candidate clusters spawn from the same distribution function (V-linkage). Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).

k-means clustering

The k-means algorithm assigns each point to the cluster whose center (also called centroid) is nearest. The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. Example: The data set has three dimensions and the cluster has two points: X = (x1,x2,x3) and Y = (y1,y2,y3). Then the centroid Z becomes Z = (z1,z2,z3), where

The algorithm steps are: Choose the number of clusters, k. Randomly generate k clusters and determine the cluster centers, or directly generate k random points as cluster centers. Assign each point to the nearest cluster center, where "nearest" is defined with respect to one of the distance measures discussed above.

Recompute the new cluster centers. Repeat the two previous steps until some convergence criterion is met (usually that the assignment hasn't changed).

The main advantages of this algorithm are its simplicity and speed which allows it to run on large datasets. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments (the k-means++ algorithm addresses this problem by seeking to choose better starting clusters). It minimizes intra-cluster variance, but does not ensure that the result has a global minimum of variance. Another disadvantage is the requirement for the concept of a mean to be definable which is not always the case. For such datasets the k-medoids variants is appropriate. An alternative, using a different criterion for which points are best assigned to which centre is k-medians clustering.

6}. Differentiate between Web content mining and Web usage mining.

A. Web Content Mining

Web content mining targets the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as images, video, and audio, which are embedded in or linked to the web pages.

It is also quite different from Data mining because Web data are mainly semi-structured and/or unstructured, while Data mining deals primarily with structured data. Web content mining is also different from Text mining because of the semi-structure nature of the Web, while Text mining focuses on unstructured texts. Web content mining thus requires creative applications of Data mining and / or Text mining techniques and also its own unique approaches. In the past few years, there was a rapid expansion of activities in the Web content mining area. This is not surprising because of the phenomenal growth of the Web contents and significant economic benefit of such mining. However, due to the heterogeneity and the lack of structure of Web data, automated discovery of targeted or unexpected knowledge information still present many challenging research problems.

Web content mining could be differentiated from two points of view: Agent-based approach or Database approach. The first approach aims on improving the information finding and filtering. The second approach aims on modeling the data on the Web into more structured form in order to apply standard database querying mechanism and data mining applications to analyze it. B. Web Usage Mining

Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discover user navigation patterns from web data, tries to discovery the useful information from the secondary data derived from the interactions of the users while surfing on the Web. Web usage mining collects the data from Web log records to discover user access patterns of web pages. There

are several available research projects and commercial tools that analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization.

The only information left behind by many users visiting a Web site is the path through the pages they have accessed. Most of the Web information retrieval tools only use the textual information, while they ignore the link information that could be very valuable. In general, there are mainly four kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: Association Rule mining Sequential pattern Clustering Classification

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow 3rd Edition TEXTBOOK
0% (1)
Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow 3rd Edition TEXTBOOK
14 pages
Analytical Numerical Analysis
86% (7)
Analytical Numerical Analysis
3 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
No ratings yet
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
5 pages
Paper 16 - Clustering Applied To Data Structuring and Retrieval
No ratings yet
Paper 16 - Clustering Applied To Data Structuring and Retrieval
6 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
4 3 Topic Notes New
No ratings yet
4 3 Topic Notes New
9 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Unit 5
No ratings yet
Unit 5
63 pages
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
No ratings yet
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
10 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
13_Unsupervised_Learning
No ratings yet
13_Unsupervised_Learning
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
MOD 5 BUSAN
No ratings yet
MOD 5 BUSAN
5 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
1.1 Web Mining
No ratings yet
1.1 Web Mining
16 pages
Clustering: Dr. Md. Al-Amin Bhuiyan
No ratings yet
Clustering: Dr. Md. Al-Amin Bhuiyan
6 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
Cluster_analysis
No ratings yet
Cluster_analysis
22 pages
DM_C6
No ratings yet
DM_C6
37 pages
Clustering
No ratings yet
Clustering
39 pages
DS9 - Clustering
No ratings yet
DS9 - Clustering
35 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Recent Advances in Clustering A Brief Survey
No ratings yet
Recent Advances in Clustering A Brief Survey
9 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
Data mining, Vipin Kumar, Pang-Ning Tan, Michael Steinback, Anuj Karpatne - Introduction to Data Mining-Pearson (1)
No ratings yet
Data mining, Vipin Kumar, Pang-Ning Tan, Michael Steinback, Anuj Karpatne - Introduction to Data Mining-Pearson (1)
81 pages
Clustering
No ratings yet
Clustering
34 pages
Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
DMDW 5th Module
No ratings yet
DMDW 5th Module
28 pages
Clustering
No ratings yet
Clustering
75 pages
2503.19067v1
No ratings yet
2503.19067v1
60 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Unit 4
No ratings yet
Unit 4
40 pages
Clustering and Pattern Recognition Unit 5
No ratings yet
Clustering and Pattern Recognition Unit 5
21 pages
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
No ratings yet
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
11 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Yihao Final Paper CCSC for Submission
No ratings yet
Yihao Final Paper CCSC for Submission
6 pages
Unit 4
No ratings yet
Unit 4
5 pages
The K-Means Clustering Technique General Considera
No ratings yet
The K-Means Clustering Technique General Considera
11 pages
Clustering
No ratings yet
Clustering
75 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
PART2
No ratings yet
PART2
61 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Ref 2 Hierarchical
No ratings yet
Ref 2 Hierarchical
7 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Trade Policy: Global Trade and Finance Prof. Bryson, Marriott School
No ratings yet
Trade Policy: Global Trade and Finance Prof. Bryson, Marriott School
43 pages
Productivity and QM
No ratings yet
Productivity and QM
78 pages
Human Resource MGMT
No ratings yet
Human Resource MGMT
51 pages
HRM Notes
100% (1)
HRM Notes
63 pages
HRM Notes
100% (1)
HRM Notes
70 pages
Consumer Relations and Marketing
No ratings yet
Consumer Relations and Marketing
15 pages
Globalization of Banking Sector
No ratings yet
Globalization of Banking Sector
9 pages
Productivity Quality Management Notes What Quality
No ratings yet
Productivity Quality Management Notes What Quality
6 pages
Problems & Remedies of Co-Operative Marketing
No ratings yet
Problems & Remedies of Co-Operative Marketing
3 pages
Example of Marketing Co-Op
No ratings yet
Example of Marketing Co-Op
3 pages
Fhs Notes
No ratings yet
Fhs Notes
42 pages
Challenges & Strategy of Co-Operative Marketing
No ratings yet
Challenges & Strategy of Co-Operative Marketing
6 pages
Organizational Behav Ior: Eleventh Editi On
No ratings yet
Organizational Behav Ior: Eleventh Editi On
23 pages
Busn Comm (Formal & Informal)
No ratings yet
Busn Comm (Formal & Informal)
4 pages
DSA Lab 6
No ratings yet
DSA Lab 6
10 pages
Lecture 12 AG
No ratings yet
Lecture 12 AG
10 pages
Mid Sem Questions
No ratings yet
Mid Sem Questions
2 pages
Probability S3
No ratings yet
Probability S3
32 pages
Differentials
No ratings yet
Differentials
9 pages
MATHEMATICS III July 2023
No ratings yet
MATHEMATICS III July 2023
2 pages
Unit 3 Z Transfroms and Its Implementation
No ratings yet
Unit 3 Z Transfroms and Its Implementation
83 pages
Answer-KBS MidTermEXAM 14 11 2022
No ratings yet
Answer-KBS MidTermEXAM 14 11 2022
6 pages
Computer Project
No ratings yet
Computer Project
5 pages
Foundation of Data Science
100% (3)
Foundation of Data Science
3 pages
Data Structure and Algorithm Analysis
No ratings yet
Data Structure and Algorithm Analysis
2 pages
End PM 10-02-2021
No ratings yet
End PM 10-02-2021
3 pages
8 - Control Loops
No ratings yet
8 - Control Loops
10 pages
ALG Part1 ASSIGNMENT
No ratings yet
ALG Part1 ASSIGNMENT
28 pages
37-Module-4 Query Optimization-16-03-2024
No ratings yet
37-Module-4 Query Optimization-16-03-2024
26 pages
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
No ratings yet
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
39 pages
Serge Levine Course Introduction To Reinforcement Learning 4: Actor Criric
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 4: Actor Criric
28 pages
CS211 Flow Control Structures
No ratings yet
CS211 Flow Control Structures
29 pages
Btech As 3 Sem Mathematics 3 Nas301 2022
No ratings yet
Btech As 3 Sem Mathematics 3 Nas301 2022
2 pages
Lec 22-26 Power System Security
No ratings yet
Lec 22-26 Power System Security
43 pages
O'Level Mock Workshop Paper-2
No ratings yet
O'Level Mock Workshop Paper-2
15 pages
Transactions Letters: Implementing The PPM Data Compression Scheme
No ratings yet
Transactions Letters: Implementing The PPM Data Compression Scheme
5 pages
Quantum Computation Evolution
No ratings yet
Quantum Computation Evolution
2 pages
Dynamics and Control of Double-Pipe Heat Exchanger
No ratings yet
Dynamics and Control of Double-Pipe Heat Exchanger
13 pages
Msa CS801 1.4 CP
No ratings yet
Msa CS801 1.4 CP
2 pages
An automatic credit Analysis model
No ratings yet
An automatic credit Analysis model
12 pages
Event Ordering
No ratings yet
Event Ordering
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

5) - Differentiate Between K-Means and Hierarchical Clustering

Uploaded by

5) - Differentiate Between K-Means and Hierarchical Clustering

Uploaded by

5). Differentiate between K-means and Hierarchical clustering Hierarchical clustering Merge-arrow.

A. Web Content Mining

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.