# Distance Measures

## 9 Common Measures in Data Science

Reference: 9 Distance Measures in Data Science

## Rotation: Geodesic Distance

Geodesic distance on the unit sphere: SO(3) is a compact Lie group and it has a natural Riemannian metric; that is, an inner product on its tangent space so(3) at every point. [Huynh 2009]

This metric tries to find$R$such that $R_1 = RR_2$

*.*Therefore, $R = R_1R_2^{-1} = R_1R_2^T$.In the second step, it is the matrix logorithm from SO(3) to so(3). theta = arccos(1/2(tr(R)-1)). See the rule (c) before equation (3.61) in the Modern Robotics textbook.

Lastly, the range of the arccos function is [0, pi], so there is no need to add abs().

## Feature Distance

Humming Distance for binary descriptors

L1 or L2 Distance for histogram based descriptors (a float vector)

Sum of Squared Distance (SSD) for image patches (in fact it is still the L2 distance)

can give good scores to very ambiguous (bad) matches

Improvement: use the ratio distance = SSD(f1, f2) / SSD(f1, f2’)

gives small values for ambiguous matches

#### Normalized Euclidean Distance vs. Normalized Cross Correlation

The normalized Euclidean distance is the distance between two normalized vectors that have been normalized to length one. If the vectors are identical then the distance is `0`

, if the vectors point in opposite directions the distance is `2`

, and if the vectors are orthogonal (perpendicular) the distance is `sqrt(2)`

. It is a positive definite scalar value between `0`

and `2`

.

The normalized cross-correlation is the dot product between the two normalized vectors. If the vectors are identical, then the correlation is `1`

, if the vectors point in opposite directions the correlation is `-1`

, and if the vectors are orthogonal (perpendicular) the correlation is `0`

. It is a scalar value between `-1`

and `1`

. This is also called the **cosine similarity**.

References: https://datascience.stackexchange.com/a/6545

## Probability Distribution

Probability Mass Functions (PMF) are used to describe

**discrete**probability distributions.Probability Density Functions (PDF) are used to describe

**continuous**probability distributions.

#### KL-Divergence (Relative Entropy)

The relative entropy or Kullback-Leibler (KL) distance between two PMFs p(x) and q(x) (that are defined on the same alphabet) is:

KL-distance is not a rigorous distance metric, because it does not meet all of the properties (e.g., does not satisfy the triangular inequality and it is not symmetric), but it is (to some degree) a distance measure because it does meet some of the properties (e.g., always non-negative).

This is not the only measure between two distributions, but it is the most important one in information theory.

References: EE250 Information Theory Lecture Notes

#### Earth Mover's Distance (Wasserstein metric)

In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric.

Informally, if the distributions are interpreted as two different ways of piling up a certain amount of earth (dirt) over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be the amount of dirt moved times the distance by which it is moved.

The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in

**normalized histograms or probability density functions**. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions.Related Keywords: Transportation Theory, Optimal Transport

It is said that one can possibly replace KL-Divergence in machine learning papers with this new metric and attain better performance.

Last updated