We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5
Clustering
Clustering
Clustering helps to create a homogeneous group of customers/entities for better
management of customers/entities. Clustering is a divide-and-conquer strategy which divides the dataset into homogenous groups which can be further used to prescribe the right strategy for different groups. In clustering, the objective is to ensure that the variation within a cluster is minimized while the variation between clusters is maximized.
Clustering algorithms are unsupervised learning algorithms (classes are not
known a priori). (Classification problems are supervised learning algorithms, where classes are known a priori in the training data. Another important difference between clustering and classification is that clustering is descriptive analytics whereas classification is usually a predictive analytics algorithm. The main objective of clustering is to create heterogeneous subsets (clusters) from the original dataset such that records within a cluster are homogeneous and identify the characteristics that differentiate the subsets. For example, if a company wants to increase its brand awareness to appeal to all its existing or possible future customers, it must design a campaign. The company can design a single campaign to address all its customers. But what if its customers have different characteristics such as varied income, age, preferences, profession, gender? The same campaign may not appeal to all of them. The company can think of running multiple campaigns targeting different customer segments. How does clustering work?
Clustering algorithms use different distance or similarity or dissimilarity
measures to derive different clusters. The type of distance/similarity measure used plays a crucial role in the final cluster formation. Larger distance would imply that observations are far away from one another, whereas higher similarity would indicate that the observations are similar Some common distance metrics: • Euclidean distance • Minkowski distance • Jaccard similarity coefficient • Cosine similarity • Gower’s similarity coefficient K means clustering