0% found this document useful (0 votes)
14 views10 pages

Distance Functions

The document outlines various distance metrics used in data analysis, including Cosine, Euclidean, Mahalanobis, Hellinger, Jaccard, Manhattan, Correlation, Dice, Hamming, and Chebyshev distances. Each metric is defined with its mathematical formula, terms explained, and specific use cases such as document similarity, clustering, and anomaly detection. The metrics vary in their approach, measuring angles, straight-line distances, correlations, and dissimilarities between vectors or sets.

Uploaded by

mailtoamar933
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

Distance Functions

The document outlines various distance metrics used in data analysis, including Cosine, Euclidean, Mahalanobis, Hellinger, Jaccard, Manhattan, Correlation, Dice, Hamming, and Chebyshev distances. Each metric is defined with its mathematical formula, terms explained, and specific use cases such as document similarity, clustering, and anomaly detection. The metrics vary in their approach, measuring angles, straight-line distances, correlations, and dissimilarities between vectors or sets.

Uploaded by

mailtoamar933
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Cosine Distance

Measures the angular difference between vectors, ignoring


their magnitude

x ·y
dcos(x, y ) = 1 −
∥x∥ · ∥y ∥

Terms Explained:
▶ x, y : Non-zero vectors in Rn
▶ x · y : Dot product
▶ ∥x∥, ∥y ∥: Euclidean norms
▶ Range: 0 (same direction) to 2 (opposite directions)

Use Cases:
Information Retrieval: Document similarity
Recommender Systems: User preference matching
@AIinMinutes
Euclidean Distance
Measures the straight-line distance between two vectors in
space; equal to the length of their difference vector

v
u n
uX
d(x, y ) = t (xi − yi )2 = ∥x − y ∥2
i=1

Terms Explained:
▶ x, y : Vectors in Rn
▶ xi − yi : Difference at dimension i
▶ ∥x − y ∥2 : L2 norm of the difference vector

Use Cases:
k-Nearest Neighbors: Finding similar data points
k-Means: Clusters data by minimizing intra-cluster
distances.
@AIinMinutes
Mahalanobis Distance
Measures distance while accounting for correlations among
features

q
d(x, y ) = (x − y )T Σ−1 (x − y )

Terms Explained:
▶ x, y : Vectors in Rn
▶ Σ−1 : Inverse covariance matrix
▶ Normalizes by feature covariances

Use Cases:
Outlier Detection: Accounts for feature correlations
Classification: Handles different feature scales and
correlations

@AIinMinutes
Hellinger Distance
Measures how different two probability distributions are

v
u n
1 uXp p
H(P, Q) = √ t ( Pi − Qi )2
2 i=1

Terms Explained:
▶ P, Q: Probability distributions

▶ Pi : Square root of probability at position i
▶ H(P, Q) ∈ [0, 1]: 0 = identical, 1 = no overlap

Use Cases:
Anomaly Detection: Identifies statistical deviations
Imbalance-aware Algorithms: Used in Hellinger Distance
Decision Trees for handling class imbalance.
@AIinMinutes
Jaccard Distance
Measures how different two sets are by comparing their
shared and unique elements

|X ∩ Y |
dJ (X , Y ) = 1 −
|X ∪ Y |

Terms Explained:
▶ X , Y : Two sets
▶ |X ∩ Y |: Size of intersection
▶ |X ∪ Y |: Size of union
▶ Range: 0 (identical) to 1 (disjoint)

Use Cases:
Document Similarity: Comparing text as word sets
Recommender Systems: Finding similar user preferences
@AIinMinutes
Manhattan Distance
Measures distance as the sum of absolute differences along
each axis

n
X
d(x, y ) = |xi − yi | = ∥x − y ∥1
i=1

Terms Explained:
▶ x, y : Vectors in Rn
▶ |xi − yi |: Absolute difference at dimension i
▶ ∥x − y ∥1 : L1 norm (taxicab norm)

Use Cases:
Grid Navigation: Calculating city block distances
Feature Selection: L1 Regularizer

@AIinMinutes
Correlation Distance
Measures dissimilarity based on how variables are
statistically related

cov(x, y )
dcorr (x, y ) = 1 − ρ(x, y ) = 1 −
σx σy

Terms Explained:
▶ x, y : Data vectors of equal length
▶ ρ(x, y ): Pearson correlation coefficient
▶ cov(x, y ): Covariance between x and y
▶ σx , σy : Standard deviations

Use Case:
Feature Agglomeration: Correlation Clustering

@AIinMinutes
Dice Distance/Loss
Measures set dissimilarity, placing greater emphasis on
shared elements than the Jaccard distance

2|X ∩ Y |
dD (X , Y ) = 1 −
|X | + |Y |

Terms Explained:
▶ X , Y : Two sets
▶ |X ∩ Y |: Size of intersection
▶ |X | + |Y |: Sum of set sizes
▶ Range: 0 (identical) to 1 (no overlap)

Use Cases:
Image Segmentation: Evaluates segmentation overlap in
image analysis; also as a loss function
@AIinMinutes
Hamming Distance
Counts the number of positions where two sequences differ

n
X
dH (x, y ) = I(xi ̸= yi )
i=1

Terms Explained:
▶ x, y : Equal-length sequences
▶ I(xi ̸= yi ): Indicator function (1 if xi ̸= yi , 0
otherwise)
▶ Counts positions where elements differ

Use Cases:
Error Detection: Hamming codes for transmission errors
Bioinformatics: Comparing DNA sequences
@AIinMinutes
Chebyshev Distance
Measures distance between vectors using the largest
absolute difference in any dimension

d∞ (x, y ) = max |xi − yi | = ∥x − y ∥∞


i

Terms Explained:
▶ x, y : Vectors in Rn
▶ maxi |xi − yi |: Maximum absolute difference
▶ ∥x − y ∥∞ : L∞ norm (chessboard distance)

Use Cases:
Anomaly Detection: Flags outliers based on the largest
deviation across features
Warehouse Optimization: Finding minimax distances

@AIinMinutes

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy