0% found this document useful (0 votes)

3 views61 pages

Lecture Unsupervised (17!04!2024).Pptx

The document provides an overview of unsupervised learning, focusing on clustering techniques such as K-means and DBSCAN. It discusses key concepts, advantages, disadvantages, and applications of these algorithms, particularly in mining engineering. Additionally, it highlights the challenges faced in unsupervised learning and the importance of understanding the assumptions behind clustering methods.

Uploaded by

21je0842

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views61 pages

Lecture Unsupervised (17!04!2024).Pptx

Uploaded by

21je0842

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Lecture

Unsupervised Learning - K means Clustering, DBSCAN

• Introduction to unsupervised learning with key concepts
• Types, Applications, and Challenges of Unsupervised Learning
• Introduction to Clustering with key concepts
• Common clustering algorithms and evaluation metrics
• Introduction to K means clustering with a detailed explanation
• Assumptions of K means clustering
• Advantages and Disadvantages of K means clustering
• Applications of K means clustering in mining engineering

3
• Introduction to Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) with key concepts
• Algorithmic steps of DBSCAN clustering
• Assumptions of DBSCAN clustering
• Advantages and Disadvantages of DBSCAN clustering
• Applications of DBSCAN clustering in mining engineering

4
Unsupervised Learning
• Unsupervised learning is a branch of machine learning where the model is trained on input data without
explicit supervision or labeled responses.

• In other words, the algorithm learns patterns, structures, or relationships within the data without being given
specific outcomes to predict.

• It is used to uncover hidden insights and structures in the data.

5
Key Concepts
No Supervision:

• Unlike supervised learning, unsupervised learning algorithms do not require labeled output
during training.

• The model learns from the input data alone, without any external feedback on the
correctness of predictions.

Exploratory Analysis:

• Unsupervised learning is often used for exploratory data analysis.

• By discovering hidden patterns or structures in the data, it can provide valuable insights
and inform subsequent analysis or decision-making processes..

6
Types of Unsupervised Learning:
• Clustering: Grouping similar data points together into clusters based on some similarity
metric. Examples include K-means, hierarchical clustering, and DBSCAN.

• Dimensionality Reduction: Reducing the number of features or variables in the data while
preserving its structure and important information. Techniques like Principal Component
Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) fall into this
category.

• Anomaly Detection: Identifying rare items, events, or observations that differ significantly
from the majority of the data. One-class SVM and Isolation Forest are common algorithms for
anomaly detection.

• Association Rule Learning: Discovering interesting relationships or associations between

variables in large datasets. Apriori algorithm and FP-growth algorithm are widely used for
association rule learning.

7
Applications of Unsupervised Learning
Unsupervised learning finds applications across various domains:

Market Segmentation: Identifying groups of customers with similar behaviors or

characteristics for targeted marketing campaigns.

Image and Text Clustering: Grouping similar images or documents together based on
content or features.

Anomaly Detection: Detecting fraudulent transactions, network intrusions, or equipment

failures by identifying unusual patterns in data.

Recommendation Systems: Suggesting products, movies, or articles to users based on

their preferences and past interactions.

Dimensionality Reduction: Visualizing high-dimensional data in lower-dimensional space

for easier interpretation or processing.

8
Challenges of Unsupervised Learning
Despite its usefulness, unsupervised learning comes with its own set of challenges:

Evaluation: Unlike supervised learning, evaluating the performance of unsupervised learning

algorithms can be subjective and challenging since there are no predefined labels to compare
predictions against.

Interpretability: Interpreting the results of unsupervised learning algorithms can be difficult,

especially when dealing with complex high-dimensional data.

Scalability: Some unsupervised learning algorithms may not scale well to large datasets due to
computational complexity or memory requirements.

9
• Unsupervised learning is a fundamental concept in machine learning that plays a crucial role
in uncovering hidden patterns, relationships, and structures within data.

• By leveraging techniques like clustering, dimensionality reduction, and anomaly detection,

unsupervised learning algorithms enable data scientists and analysts to gain valuable insights
from unlabeled data, leading to better decision-making and problem-solving capabilities.

10
Machine Learning

11
Clustering
• Clustering is a fundamental technique in unsupervised machine learning, where the goal is to group similar data
points together into clusters or clusters such that data points within the same cluster are more similar to each
other compared to those in other clusters.

• Clustering algorithms are widely used in various applications such as customer segmentation, image
segmentation, anomaly detection, and more.

12
Key Concepts of Clustering
Objective: The main objective of clustering is to partition a dataset into groups or clusters, with
each cluster containing data points that are similar to each other in some way. The similarity or
dissimilarity between data points is typically measured using a distance metric, such as
Euclidean distance or cosine similarity.

Types of Clustering:

Hard Clustering:

• In hard clustering, each data point belongs exclusively to one cluster. Examples include
K-means clustering and K-medoid clustering.

Soft Clustering:

• In soft clustering (also known as fuzzy clustering), data points can belong to multiple
clusters with varying degrees of membership.

• Examples include fuzzy C-means clustering and Gaussian mixture models (GMM).

13
Key Concepts of Clustering
Distance Metric: Clustering algorithms often require a distance metric to measure the similarity
or dissimilarity between data points.

Common distance metrics include:

• Euclidean Distance

• Manhattan Distance

• Cosine Similarity

14
Common Clustering Algorithms:
K-Means Clustering:

• K-means is one of the most popular and widely used clustering algorithms.

• It partitions the data into K clusters by iteratively assigning each data point to the nearest
cluster centroid and then updating the centroids based on the mean of data points
assigned to each cluster.

Hierarchical Clustering:

• Hierarchical clustering builds a tree-like structure (dendrogram) of clusters by either

iteratively merging smaller clusters into larger ones (agglomerative) or splitting larger
clusters into smaller ones (divisive).

• It does not require specifying the number of clusters beforehand.

15
Common Clustering Algorithms
Density-Based Clustering (DBSCAN):

• DBSCAN is a density-based clustering algorithm that groups together data points that are
closely packed together (dense regions) and marks outliers as noise.

• It does not require specifying the number of clusters and can discover clusters of arbitrary
shapes and sizes.

Gaussian Mixture Models (GMM):

• GMM is a probabilistic model that represents the distribution of data as a mixture of several
Gaussian distributions.

• It models each cluster as a Gaussian distribution and assigns probabilities to data points
belonging to each cluster.

16
Evaluation of Clustering:

Evaluating the performance of clustering algorithms can be challenging because it often involves
measuring the quality of clusters without ground truth labels. Common evaluation metrics for
clustering include:

Silhouette Score: Measures how similar a data point is to its own cluster compared to other
clusters.

Davies-Bouldin Index: Measures the average similarity between each cluster and its most similar
cluster, taking into account both the compactness and separation of clusters.

Calinski-Harabasz Index: Computes the ratio of between-cluster dispersion to within-cluster

dispersion, providing a measure of cluster tightness and separation.

17
Conclusion
• Clustering is a powerful technique in machine learning for discovering hidden patterns and structures within
data.

• By grouping similar data points together into clusters, clustering algorithms provide valuable insights that can
be used for various purposes such as data exploration, segmentation, and anomaly detection.

• Understanding the principles and algorithms of clustering is essential for data scientists and analysts working
with unsupervised learning tasks.

18
K Means Clustering
• K-means clustering is an iterative algorithm that partitions a dataset into K clusters. It aims to minimize the sum
of squared distances between data points and their corresponding cluster centroids.

The algorithm works as follows:

1. Initialize K cluster centroids randomly.

2. Assign each data point to the nearest centroid, forming K clusters.
3. Update the centroids by computing the mean of all data points assigned to each cluster.
4. Repeat steps 2 and 3 until convergence (when the centroids no longer change significantly).

19
K Means Clustering
Mathematics Involved:

Let's break down the mathematics involved in each step of the K-means algorithm:

Step 1: Initialize centroids:

• Randomly select K data points from the dataset as the initial centroids.

Step 2: Assign data points to clusters:

• For each data point xi, compute its distance to each centroid cj using a distance metric such as
Euclidean distance:

Where:
•xi is a data point.
•cj is a centroid.
•n is the number of dimensions/features.
•xik and cjk are the kth features of xi and cj respectively.

20
K Means Clustering
Assign each data point to the cluster associated with the nearest centroid:

Step 3: Update centroids:

After all data points are assigned to clusters, update each centroid by computing the mean
of all data points assigned to that cluster:

Where:
Sj is the set of data points assigned to cluster j.
∣Sj∣ is the number of data points in cluster j.

Convergence:
Repeat steps 2 and 3 until convergence. Convergence occurs when the centroids no longer
change significantly or when a predefined number of iterations is reached.

21
K Means Clustering
Objective Function:
The objective of K-means clustering is to minimize the within-cluster sum of squared
distances, which is given by:

Where:
•J is the total within-cluster sum of squared distances.
•Sj is the set of data points assigned to cluster j.
•cj is the centroid of cluster j.

•K-means clustering is an iterative algorithm that partitions a dataset into K clusters by

minimizing the within-cluster sum of squared distances.

•By understanding the mathematical concepts involved, you gain insight into how the
algorithm operates and how it optimizes the clustering process.

22
Original Data Vs Clustered Data

23
Assumptions of K Means Clustering
K-means clustering is a powerful and widely used algorithm, but its effectiveness relies on
several assumptions about the underlying data. Understanding these assumptions is crucial
for properly applying and interpreting the results of K-means clustering. Here are the key
assumptions involved in K-means clustering:

Clusters Are Spherical:

K-means assumes that clusters in the data are spherical or globular in shape. This means
that the variance of the data points within each cluster is similar across all dimensions. If the
clusters have non-spherical shapes or varying variances, K-means may produce suboptimal
results.

Clusters Are of Similar Size:

K-means assumes that the clusters in the data are of similar size or have similar densities. In
other words, each cluster contains roughly the same number of data points. If the clusters
have significantly different sizes or densities, K-means may assign more weight to larger
clusters, potentially leading to biased results.

24
Assumptions of K Means Clustering
Clusters Are Linearly Separable:

K-means operates based on the Euclidean distance between data points and cluster
centroids. Therefore, it assumes that the clusters are linearly separable in the feature
space. If the clusters are not well-separated or have complex boundaries, K-means may
struggle to capture the underlying structure of the data accurately.

Centroids Represent Cluster Centers:

K-means assumes that the cluster centroids accurately represent the centers of the
clusters. Each data point is assigned to the cluster whose centroid is closest to it. If the
centroids do not accurately reflect the cluster centers, K-means may produce unreliable
cluster assignments.

Initial Centroid Positions Matter:

The performance of K-means clustering can be sensitive to the initial positions of the
centroids. Different initializations may lead to different clustering results. Therefore, it is
common practice to run K-means multiple times with different initializations and select the
solution with the lowest objective function value.

25
Assumptions of K Means Clustering
Continuous Variables:

K-means assumes that the input variables are continuous. It may not perform well with categorical variables unless
appropriate preprocessing is performed to convert them into a suitable numerical representation.

Understanding these assumptions is essential for correctly applying K-means clustering and interpreting its
results. Violations of these assumptions can lead to suboptimal clustering outcomes or misleading interpretations.
It's important to assess the suitability of K-means for a particular dataset and consider alternative clustering
algorithms if the assumptions are not met.

26
Advantages of K Means Clustering
Simple and Easy to Implement: K-means is relatively easy to understand and implement
compared to other clustering algorithms. Its simplicity makes it a popular choice for
clustering tasks.

Efficient: K-means is computationally efficient and can handle large datasets with a
relatively small number of clusters.

Scalability: K-means can be applied to large datasets with many dimensions, making it
suitable for high-dimensional data.

Versatile: K-means can be used for a wide range of applications, including customer
segmentation, document clustering, image segmentation, and anomaly detection.

Interpretability: The cluster centroids produced by K-means can be easily interpreted,

providing insights into the characteristics of each cluster.

Works Well with Well-Separated Clusters: K-means performs well when clusters are
well-separated and have similar sizes and densities.

27
Disadvantages of K Means Clustering
Sensitive to Initial Centroid Positions: K-means clustering is sensitive to the initial positions of the
cluster centroids. Different initializations can lead to different clustering results, which may not
necessarily be optimal.

Requires Predefined Number of Clusters (K): K-means requires specifying the number of clusters
(K) beforehand, which can be challenging, especially when the true number of clusters is unknown.

Assumes Spherical Clusters: K-means assumes that clusters are spherical and have similar sizes
and densities. It may produce suboptimal results for clusters with non-spherical shapes or varying
densities.

Struggles with Outliers: K-means is sensitive to outliers, as they can significantly affect the
positions of cluster centroids and the resulting cluster assignments.

28
Disadvantages of K Means Clustering
May Converge to Local Optima: K-means is prone to converging to local optima, especially when the
initial centroids are poorly chosen. Running K-means multiple times with different initializations can
mitigate this issue, but it increases computational overhead.

Cannot Handle Non-Linear Data: K-means assumes that clusters are linearly separable in the feature
space. It may not perform well on datasets with non-linearly separable clusters. Overall, while K-means
clustering is a powerful and widely used algorithm, it is important to consider its limitations and
suitability for specific clustering tasks. It may not always be the best choice, especially when dealing
with complex or non-linear data distributions.

29
Disadvantages of K Means Clustering
May Converge to Local Optima: K-means is prone to converging to local optima, especially when the
initial centroids are poorly chosen. Running K-means multiple times with different initializations can
mitigate this issue, but it increases computational overhead.

Cannot Handle Non-Linear Data: K-means assumes clusters are linearly separable in the feature space.
It may not perform well on datasets with non-linearly separable clusters.

Overall, while K-means clustering is a powerful and widely used algorithm, it is important to consider
its limitations and suitability for specific clustering tasks. It may not always be the best choice,
especially when dealing with complex or non-linear data distributions.

30
Applications of K Means Clustering in Mining
In the mining industry, K-means clustering can be applied to various tasks and processes to extract
valuable insights from mining data. Here are five main applications of K-means clustering in mining:

Exploratory Data Analysis:

• K-means clustering can be used for exploratory data analysis to identify natural groupings or
patterns within mining data.

• By clustering data points such as geological samples, drilling data, or mineral compositions, mining
companies can gain insights into the underlying structure of their data and uncover hidden
relationships between different variables.

31
Applications of K Means Clustering in Mining
Ore Body Segmentation:

• K-means clustering can be applied to segment ore bodies based on their mineral
composition, grade, or other geological attributes.

• By clustering geological data collected from exploration activities, mining engineers can
delineate different zones within an ore body, which helps in resource estimation, mine
planning, and optimization of extraction processes.

Rock Classification:

• K-means clustering can be used for rock classification based on physical properties such as
density, hardness, and porosity.

• By clustering rock samples collected from boreholes or mining sites, geologists can
categorize rocks into different lithological classes, which aids in geological mapping,
mineral prospecting, and characterization of mining environments.

32
Applications of K Means Clustering in Mining
Fault Detection and Monitoring:

K-means clustering can be employed for fault detection and monitoring in mining operations. By clustering
sensor data collected from equipment, such as vibration sensors or temperature sensors, mining
companies can identify abnormal patterns or deviations from normal operating conditions, which helps in
predicting equipment failures, optimizing maintenance schedules, and ensuring the safety of mining
personnel.

33
Applications of K Means Clustering in Mining
Market Segmentation and Demand Forecasting:

• K-means clustering can be used for market segmentation and demand forecasting in the
mining industry.

• By clustering customers based on their purchasing behavior, geographical location, or

industry sector, mining companies can tailor their marketing strategies, product offerings, and
pricing models to different market segments.

• Additionally, by clustering historical sales data, mining companies can forecast future demand
for their products and plan production schedules accordingly.

Overall, K-means clustering offers a versatile and powerful tool for analyzing mining data and
extracting valuable insights that can drive decision-making, improve operational efficiency, and
optimize resource utilization in the mining industry.

34
DBSCAN
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering algorithm in
machine learning.

• Unlike traditional clustering algorithms such as k-means, DBSCAN does not require specifying the number
of clusters beforehand.

• It works by grouping together points that are closely packed together, based on two parameters:
eps (epsilon) and min_samples.

• Example of application of DBSCAN on 3D data with eps=0.5, min_samples = 5

35
DBSCAN
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering algorithm
used in machine learning and data mining.

• It's particularly effective in identifying clusters of varying shapes and sizes in a dataset, while also being
robust to noise and outliers.

• Example of application of DBSCAN on non linear 3D clusters (moon shaped) with eps=0.2, min_samples = 5

36
Core Concepts of DBSCAN
• Here are the core concepts of DBSCAN:

• Density-Based Clustering: Unlike centroid-based clustering algorithms like k-means,

DBSCAN identifies clusters based on the density of data points. It defines clusters as
continuous regions of high density separated by regions of low density.

• Core Points: In DBSCAN, a core point is a data point that has at least a specified number of
neighboring points (MinPts) within a specified distance (eps). Core points are essentially the
center points of clusters.

• Border Points: Border points are not core points themselves, but they lie within the
neighborhood (within eps distance) of a core point. These points may be part of a cluster,
but they don't have enough neighbors to be considered core points.

37
Core Concepts of DBSCAN
• Here are the core concepts of DBSCAN:

• Noise Points (Outliers): Points that are neither core points nor border points are considered noise
points or outliers. These are typically isolated points that don't belong to any cluster.

• Eps (ε): Eps is a distance threshold that defines the radius of the neighborhood around a data
point. Any point within this distance is considered a neighbor.

• MinPts: MinPts is the minimum number of data points required within the Eps neighborhood of a
point for that point to be considered a core point.

38
Overview of DBSCAN Algorithm
The DBSCAN algorithm proceeds as follows:

1.Initialization: Assign all points as unvisited.

2.Core Point Identification: For each point p, calculate the distance between p and all other points. If the number of
points within distance ϵ is greater than or equal to minPts, mark p as a core point.

3.Cluster Expansion: For each core point p, if it's not already assigned to a cluster, create a new cluster and add p to it.
Then, recursively add all points that are density-reachable from p to the cluster.

4.Noise Point Handling: Mark all unvisited points as noise or outliers.

39
DBSCAN Algorithm Steps
Given a dataset D consisting of n data points {p1,p2,...,pn}, where each point is represented as a d-dimensional vector,
DBSCAN identifies clusters based on the density of points in the dataset.

•ϵ is the radius of the neighborhood around each point.

•minPts is the minimum number of points within ϵ to consider a point a core point.

Step 1: Density Calculation

For each point pi, calculate its neighborhood by finding all points within a distance ϵ (epsilon) from pi. This can be
represented as:

Where:

• N(pi) represents the neighborhood of point pi.

• distance(pi,pj) is the distance metric used to measure the distance between points pi and pj,
commonly Euclidean distance.

40
DBSCAN Algorithm Steps
Step 2: Core Point Identification
A point pi is considered a core point if the cardinality of its neighborhood is greater than or equal to the threshold minPts.

Mathematically:

Step 3: Density-Reachability
Two points pi and pj are density-reachable if there exists a chain of points connecting them
such that every consecutive pair in the chain is within the neighborhood of each other.

Formally:
•Point pj is density-reachable from point pi if there exists a chain pi,p1,p2,...,pk,pj such that:
• p1=pj or p1 is a core point.
• pk=pj or pk is directly reachable from pk−1.

41
DBSCAN Algorithm Steps
Step 4: Cluster Formation
•For each core point pi that has not been assigned to a cluster:
• Create a new cluster C.
• Add pi to C.
• For each point pj that is density-reachable from pi:
• Add pj to C.

Step 5: Noise Points

Points that are not assigned to any cluster are considered noise points or outliers.

42
Assumptions of DBSCAN
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a powerful clustering algorithm in
machine learning. Like any other algorithm, it operates under certain assumptions, which are important to
understand to ensure its proper application and interpretation.

Here are the key assumptions of DBSCAN:

• Density-Based Assumption: DBSCAN assumes that clusters are regions of high density
separated by regions of low density in the data space. This means that points within a cluster are
densely packed together, while points outside the cluster regions are relatively sparse.

• Varying Density: DBSCAN is capable of identifying clusters of arbitrary shapes and sizes,
assuming that the density of points within each cluster varies. This allows DBSCAN to effectively
cluster datasets where traditional centroid-based algorithms might struggle, especially when
clusters have irregular shapes or varying densities.

43
Assumptions of DBSCAN
• Presence of Noise: DBSCAN assumes that the dataset may contain noise, which refers to data
points that do not belong to any cluster or have insufficient support to be considered part of a
cluster. The algorithm is designed to handle noise effectively by labeling such points as
outliers.

• Parameter Sensitivity: DBSCAN's performance can be sensitive to its parameters, namely the
epsilon (eps) neighborhood radius and the minimum number of points (MinPts) required to
form a dense region. Choosing appropriate values for these parameters is crucial for the
algorithm to produce meaningful clusters. However, DBSCAN is relatively robust to parameter
settings compared to other clustering algorithms.

• Metric Space: DBSCAN operates in a metric space, assuming that the distance metric used to
measure the proximity between data points is meaningful and appropriate for the dataset. The
choice of distance metric can impact the clustering results, so it's essential to select a metric
that aligns with the data's characteristics and domain knowledge.

44
Assumptions of DBSCAN

• Data Preprocessing: DBSCAN assumes that the input data has been preprocessed appropriately,
including handling missing values, scaling features if necessary, and removing irrelevant features
or noise. Preprocessing ensures that the clustering algorithm operates on clean and meaningful
data, leading to more accurate clustering results.

• Understanding these assumptions is crucial for effectively applying DBSCAN in practice. Violations
of these assumptions can lead to suboptimal clustering results or misinterpretation of the clusters
identified by the algorithm.

• Therefore, it's essential to carefully evaluate the suitability of DBSCAN for a given dataset and
ensure that its assumptions are met or appropriately addressed.

45
Advantages of DBSCAN

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering

algorithm in machine learning with several advantages and disadvantages:

Advantages:

• Robust to Noise and Outliers: DBSCAN is robust to noise and outliers in the dataset. It can
identify and disregard noise points as outliers, ensuring that they do not affect the formation of
clusters. This makes DBSCAN particularly useful for datasets with significant amounts of
noise.

• Ability to Detect Arbitrary Shapes: Unlike centroid-based clustering algorithms like k-means,
DBSCAN can identify clusters of arbitrary shapes and sizes. It is capable of discovering
clusters with irregular shapes, clusters embedded within clusters (known as density-connected
components), and clusters with varying densities.

46
Advantages of DBSCAN

• Does Not Require Pre-specification of Number of Clusters: DBSCAN does not require the
number of clusters to be pre-specified, unlike some other clustering algorithms. This makes it
suitable for datasets where the number of clusters is unknown or varies, allowing for more
flexible and adaptive clustering.

• Parameter Tuning is Easier: While DBSCAN requires the specification of two parameters
(epsilon and minimum points), tuning these parameters is often easier compared to other
clustering algorithms. Moreover, DBSCAN is less sensitive to the choice of parameters than
some other algorithms.

• Efficient in Handling Large Datasets: DBSCAN's time complexity is typically better than
hierarchical clustering algorithms for large datasets, particularly when using spatial indexing
structures like kd-trees or R-trees. This makes it efficient for clustering large datasets.

47
Disadvantages of DBSCAN
Disadvantages:

• Sensitivity to Parameters: While DBSCAN is less sensitive to parameter settings compared to

some other clustering algorithms, choosing appropriate values for the epsilon (eps)
neighborhood radius and the minimum number of points (MinPts) can still be challenging. Poor
parameter choices may lead to suboptimal clustering results.

• Difficulty with Varying Density: DBSCAN may struggle to identify clusters in datasets with varying
densities, where the density of points within clusters varies significantly. In such cases, setting
appropriate parameters becomes even more crucial, and alternative clustering algorithms might
be more suitable.

48
Disadvantages of DBSCAN

• Memory and Computational Requirements: While DBSCAN is efficient for large datasets, it still
requires storing the entire dataset in memory, which can be challenging for extremely large datasets.
Additionally, computing the distance between points and determining neighborhood relationships can
be computationally intensive for high-dimensional data.

• Cluster Border Determination: DBSCAN may misclassify points near the border between clusters as
noise or assign them to the wrong cluster if the density varies gradually across the dataset. This can
lead to less accurate cluster boundaries, especially in datasets with overlapping clusters.

49
Disadvantages of DBSCAN
• Not Suitable for all Types of Data: DBSCAN's effectiveness depends on the underlying
distribution of the data and the characteristics of the clusters. It may not perform well on
datasets with uniform density or very high-dimensional data, where the curse of dimensionality
becomes a significant issue.

• In summary, while DBSCAN offers several advantages such as robustness to noise, ability to
detect arbitrary shapes, and flexibility in cluster detection, it also has limitations such as
sensitivity to parameters, difficulty with varying density, and potential challenges with cluster
border determination.

• It is essential to consider these factors when deciding whether to use DBSCAN for a particular
clustering task and to carefully tune its parameters for optimal performance.

50
Applications of DBSCAN in Mining
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has numerous applications in mining,
particularly in the context of spatial data analysis, resource exploration, and operational optimization. Some key
applications of DBSCAN in mining include:

Mineral Exploration:

• Mineral exploration involves locating areas with high mineral potential for further
development. DBSCAN can be applied to geological datasets, such as soil geochemistry,
rock samples, or geophysical surveys, to identify spatial clusters of mineral occurrences
or anomalies. By analyzing the density and spatial distribution of mineralization data,
DBSCAN can assist exploration geologists in pinpointing prospective areas for further
investigation.

• For example, clusters of elevated metal concentrations may indicate the presence of
mineral deposits or ore bodies, guiding exploration efforts towards areas with higher
likelihood of economic mineralization. DBSCAN's ability to detect clusters of arbitrary
shapes and sizes makes it well-suited for identifying mineralization patterns in complex
geological settings.

51
Applications of DBSCAN in Mining

Ore Body Detection:

• In mining operations, identifying and delineating ore bodies is crucial for efficient mine
planning and resource extraction. DBSCAN can be used to analyze spatial datasets, such as
drill hole assays, underground sampling data, or geological models, to delineate the
boundaries and extent of ore bodies within a mining lease. By clustering data points
representing different grades of ore or mineralization zones, DBSCAN can help define the
spatial distribution and continuity of ore deposits.

• This information is valuable for estimating mineral reserves, designing mine layouts, and
optimizing extraction methods to maximize resource recovery. DBSCAN's ability to handle
noise and outliers ensures that only meaningful clusters of mineralization data are considered
in ore body detection, improving the accuracy of resource estimation and mine planning.

52
Applications of DBSCAN in Mining
Geotechnical Monitoring:

• Geotechnical monitoring is essential for assessing and managing geological hazards, such as
slope instability, ground subsidence, or seismic activity, in mining operations. DBSCAN can
be employed to analyze spatial datasets collected from monitoring systems, such as
inclinometers, GPS receivers, or seismic sensors, to identify clusters of anomalous
geotechnical behavior indicative of potential hazards. By clustering spatial data points
representing ground movements, deformation patterns, or seismic events, DBSCAN can help
detect areas of heightened geotechnical risk and prioritize mitigation measures.

• For example, clusters of ground displacement data may indicate areas susceptible to slope
failure, prompting reinforcement measures or evacuation protocols to ensure worker safety.
DBSCAN's ability to identify spatial patterns and anomalies in geotechnical data enables
proactive risk management and mitigation strategies, reducing the likelihood of accidents or
disruptions in mining operations.

53
Applications of DBSCAN in Mining
Equipment Maintenance and Optimization:

• Maintaining and optimizing mining equipment is essential for maximizing productivity and
minimizing downtime in mining operations. DBSCAN can be utilized to analyze spatial
datasets collected from sensors installed on mining equipment, such as vibration sensors,
temperature gauges, or pressure sensors, to identify patterns indicative of equipment
performance, condition, or failure. By clustering sensor data representing equipment health
metrics, DBSCAN can help detect anomalies or deviations from normal operating conditions,
signaling potential equipment malfunctions or maintenance needs.

• For example, clusters of vibration data exceeding threshold levels may indicate impending
equipment failure, prompting preventive maintenance actions to avoid costly downtime or
damage. DBSCAN's ability to identify spatial clusters of abnormal equipment behavior
facilitates predictive maintenance strategies and operational optimization, enhancing
equipment reliability and efficiency in mining operations.

54
Applications of DBSCAN in Mining
Environmental Monitoring:

• Monitoring and mitigating environmental impacts associated with mining activities are
essential for sustainable resource extraction and community acceptance. DBSCAN can be
applied to spatial datasets on air quality, water quality, soil contamination, or biodiversity
collected within and around mining sites to identify clusters of environmental impact or
pollution. By clustering spatial data points representing pollutant concentrations, habitat
distributions, or ecological indicators, DBSCAN can help pinpoint hotspots of environmental
degradation or ecological disturbance.

• For example, clusters of elevated metal concentrations in water samples may indicate areas
of contamination from mine discharges, prompting remediation measures to protect water
quality and aquatic ecosystems. DBSCAN's ability to detect spatial patterns and anomalies in
environmental data supports targeted monitoring and management efforts, ensuring
compliance with regulatory requirements and minimizing negative environmental impacts
associated with mining operations.

55
Applications of DBSCAN in Mining
• These applications demonstrate the versatility and effectiveness of DBSCAN in addressing various spatial
challenges encountered in the mining industry, from mineral exploration and ore body detection to
geotechnical monitoring, equipment maintenance, and environmental management.

• By leveraging DBSCAN's capabilities for spatial clustering and analysis, mining professionals can gain
valuable insights into geological, operational, and environmental aspects of mining activities, facilitating
informed decision-making and sustainable resource development.

56
• O’Reilly Hands-on Machine Learning with Scikit – Learn, Keras & TensorFlow by Aurelien
Geron

• Deep learning with Python (2nd Edition) by François Chollet

• https://towardsdatascience.com/a-brief-introduction-to-unsupervised-learning-20db46445283

57
• We discussed unsupervised learning with key concepts
• We discussed types, applications, and challenges of unsupervised learning
• We discussed clustering with key concepts
• We discussed common clustering algorithms and evaluation metrics
• We discussed k means clustering with a detailed explanation
• We discussed assumptions of k means clustering
• We discussed the advantages and disadvantages of k means clustering
• We discussed applications of k means clustering in mining

58
• We discussed Density-Based Spatial Clustering of Applications with Noise (DBSCAN) in detail with key concepts
• We discussed the algorithmic steps of DBSCAN clustering
• We discussed the assumptions of DBSCAN clustering
• We discussed the advantages and Disadvantages of DBSCAN clustering
• We discussed the applications of DBSCAN clustering in mining engineering

59
60

Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unit-4
No ratings yet
Unit-4
53 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
ML-UNSUPERVISED
No ratings yet
ML-UNSUPERVISED
35 pages
K Means
No ratings yet
K Means
9 pages
Week 11
No ratings yet
Week 11
49 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unit 4
No ratings yet
Unit 4
74 pages
Week 9
No ratings yet
Week 9
66 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
unit4
No ratings yet
unit4
96 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Clustering
No ratings yet
Clustering
12 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
unsupervised-learning
No ratings yet
unsupervised-learning
18 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Untitled document
No ratings yet
Untitled document
32 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
lecture 2
No ratings yet
lecture 2
55 pages
Tunnel_Support_Systems_1743444416186773527167ead9c078b83
No ratings yet
Tunnel_Support_Systems_1743444416186773527167ead9c078b83
70 pages
METHODS_OF_ANALYSIS_AND_DESIGN_OF_SLOPES_16818846931053598795643f8615352cc
No ratings yet
METHODS_OF_ANALYSIS_AND_DESIGN_OF_SLOPES_16818846931053598795643f8615352cc
151 pages
Monitoring Rockmass Performance-Lecture Notes 1744723686117210893667fe5ee6b4c51
No ratings yet
Monitoring Rockmass Performance-Lecture Notes 1744723686117210893667fe5ee6b4c51
11 pages
Airborne_respirable_dust_survey__1724151076159116133766c475247f3a0
No ratings yet
Airborne_respirable_dust_survey__1724151076159116133766c475247f3a0
22 pages
Dijkstra_169900673413510880556544c90ef2358
No ratings yet
Dijkstra_169900673413510880556544c90ef2358
17 pages
Design of Machine Element Lecture 1
No ratings yet
Design of Machine Element Lecture 1
45 pages
Waves and Vibrations
No ratings yet
Waves and Vibrations
135 pages
MFDS Quiz 1.1
No ratings yet
MFDS Quiz 1.1
10 pages
On The Smarandache Prime Part
No ratings yet
On The Smarandache Prime Part
4 pages
Answer Second Term STPM Trial Exam 2014 PHYSICS PAPER 2 (960/2)
No ratings yet
Answer Second Term STPM Trial Exam 2014 PHYSICS PAPER 2 (960/2)
12 pages
Trigonometric Special Angles
100% (2)
Trigonometric Special Angles
12 pages
The Importance of Logic
No ratings yet
The Importance of Logic
8 pages
Conceptualizations of Professional Knowledge For Teachers of Mathematics
No ratings yet
Conceptualizations of Professional Knowledge For Teachers of Mathematics
12 pages
Unit 12 - Trigonometry Test 2
No ratings yet
Unit 12 - Trigonometry Test 2
6 pages
HS017 Risk Management Form Table 1
No ratings yet
HS017 Risk Management Form Table 1
1 page
Propeller in Open Water
No ratings yet
Propeller in Open Water
64 pages
Lecture 1 Intro To Excel
100% (1)
Lecture 1 Intro To Excel
18 pages
Assignment #8 Polarization and Stability of Point Q
No ratings yet
Assignment #8 Polarization and Stability of Point Q
6 pages
Key Maths Grade 9
No ratings yet
Key Maths Grade 9
3 pages
Hns 2321 Biostatistics Lecture Notes on Inferential Statistics
No ratings yet
Hns 2321 Biostatistics Lecture Notes on Inferential Statistics
25 pages
Maths Term 2 2019 Mocks F4 P2
No ratings yet
Maths Term 2 2019 Mocks F4 P2
16 pages
M21 Wolf57139 03 Se C21
No ratings yet
M21 Wolf57139 03 Se C21
27 pages
Arturo Magidin Curriculum Vitæ
No ratings yet
Arturo Magidin Curriculum Vitæ
11 pages
Basic Math Vocabulary
No ratings yet
Basic Math Vocabulary
4 pages
Derangements
No ratings yet
Derangements
5 pages
The Effect of NPL, Car, LDR, Oer and Nim To Banking Return On Asset
No ratings yet
The Effect of NPL, Car, LDR, Oer and Nim To Banking Return On Asset
16 pages
Grade 08 Mathematics 1st Term Test Paper 2019 English Medium - North Western Province
No ratings yet
Grade 08 Mathematics 1st Term Test Paper 2019 English Medium - North Western Province
7 pages
Can We Make Genetic Algorithms Work in High-Dimensionality Problems?
No ratings yet
Can We Make Genetic Algorithms Work in High-Dimensionality Problems?
17 pages
Linear Functions Exercises
100% (2)
Linear Functions Exercises
7 pages
Fundamental Concepts of Thermodynamics: Cy 11003 Chemistry Spring 2022 2023
No ratings yet
Fundamental Concepts of Thermodynamics: Cy 11003 Chemistry Spring 2022 2023
11 pages
Syllabus SYDE252
No ratings yet
Syllabus SYDE252
2 pages
Estimating Urban Residential Water Demand Determinants and Forecasting Water Demand For Athens Metropolitan Area, 2000-2010
No ratings yet
Estimating Urban Residential Water Demand Determinants and Forecasting Water Demand For Athens Metropolitan Area, 2000-2010
13 pages
Angel Hai NEW
No ratings yet
Angel Hai NEW
31 pages
Cart-Pole Optimal Control - Dymos
No ratings yet
Cart-Pole Optimal Control - Dymos
6 pages
In The Name of Allah, The Compassionate, The Merciful: (8 Century To 13 Century C.E.)
No ratings yet
In The Name of Allah, The Compassionate, The Merciful: (8 Century To 13 Century C.E.)
59 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture Unsupervised (17!04!2024).Pptx

Uploaded by

Lecture Unsupervised (17!04!2024).Pptx

Uploaded by

Lecture

Unsupervised Learning - K means Clustering, DBSCAN

• It is used to uncover hidden insights and structures in the data.

• Unsupervised learning is often used for exploratory data analysis.

• Association Rule Learning: Discovering interesting relationships or associations between

Market Segmentation: Identifying groups of customers with similar behaviors or

Anomaly Detection: Detecting fraudulent transactions, network intrusions, or equipment

Recommendation Systems: Suggesting products, movies, or articles to users based on

Dimensionality Reduction: Visualizing high-dimensional data in lower-dimensional space

Evaluation: Unlike supervised learning, evaluating the performance of unsupervised learning

Interpretability: Interpreting the results of unsupervised learning algorithms can be difficult,

• By leveraging techniques like clustering, dimensionality reduction, and anomaly detection,

Common distance metrics include:

• Hierarchical clustering builds a tree-like structure (dendrogram) of clusters by either

• It does not require specifying the number of clusters beforehand.

Gaussian Mixture Models (GMM):

Calinski-Harabasz Index: Computes the ratio of between-cluster dispersion to within-cluster

The algorithm works as follows:

1. Initialize K cluster centroids randomly.

Step 1: Initialize centroids:

Step 2: Assign data points to clusters:

Step 3: Update centroids:

•K-means clustering is an iterative algorithm that partitions a dataset into K clusters by

Clusters Are Spherical:

Clusters Are of Similar Size:

Centroids Represent Cluster Centers:

Initial Centroid Positions Matter:

Interpretability: The cluster centroids produced by K-means can be easily interpreted,

Exploratory Data Analysis:

• By clustering customers based on their purchasing behavior, geographical location, or

• Example of application of DBSCAN on 3D data with eps=0.5, min_samples = 5

• Density-Based Clustering: Unlike centroid-based clustering algorithms like k-means,

1.Initialization: Assign all points as unvisited.

4.Noise Point Handling: Mark all unvisited points as noise or outliers.

•ϵ is the radius of the neighborhood around each point.

Step 1: Density Calculation

• N(pi) represents the neighborhood of point pi.

Step 5: Noise Points

Here are the key assumptions of DBSCAN:

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering

• Sensitivity to Parameters: While DBSCAN is less sensitive to parameter settings compared to

Ore Body Detection:

• Deep learning with Python (2nd Edition) by François Chollet

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.