Covariance Matrix

Definitions

  • Consider a discrete random variable X with a finite list x1, ..., xk of possible outcomes, each of which has probability p1, ..., pk of occurring.

  • The expected value or mean of X is defined as Ī¼X=E[X]=āˆ‘ixipi=x1p1+ā‹Æ+xkpk\mu_X = \text{E}[X] = \sum_i x_i p_i = x_1 p_1 + \cdots + x_k p_k.

  • The variance of a random variable X is the expected value of the squared deviation from the mean of X, defined as Var(X)=E[(Xāˆ’E[X])2]=E[X2]āˆ’E2[X]\text{Var}(X) = \text{E}[(X-\text{E}[X])^2] = \text{E}[X^2] - \text{E}^2[X].

  • The standard deviation is defined as ĻƒX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}.

  • Now consider two random variables X and Y. The covariance is defined as the expected value of the product of their deviations from their individual expected values:Cov(X,Y)=E[(Xāˆ’E[X])(Yāˆ’E[Y])]=E[XY]āˆ’E[X]E[Y]\text{Cov}(X,Y) = \text{E}[(X-\text{E}[X])(Y-\text{E}[Y])] = \text{E}[XY] - \text{E}[X]\text{E}[Y].

  • The correlation is defined as Corr(X,Y)=Cov(X,Y)ĻƒXĻƒY\text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}, which ranges from -1 to 1. It is a normalized version of the covariance.

  • Now consider a column vector X=(X1,ā€¦,Xn)\mathbf{X} = (X_1, \dots,X_n), where each entry is a random variable. The covariance matrix is the matrix CXX\text{C}_{\mathbf{X}\mathbf{X}} whose (i,j)(i, j) entry is the covariance CXiXj=Cov(Xi,Xj)=E[(Xiāˆ’E[Xi])(Xjāˆ’E[Xj])]\text{C}_{X_iX_j}=\text{Cov}(X_i,X_j) = \text{E}[(X_i-\text{E}[X_i])(X_j-\text{E}[X_j])].

  • The correlation matrix follows the save way to act as a normalized covariance matrix.

Estimation from Samples

  • The above definitions are mathematical descriptions that work well for known distributions.

  • In practice, when we estimate from samples, the true distribution is unknown. In this case, we can only access sample mean, sample variance, and sample covariance.

  • Sample mean Xā€¾=1nāˆ‘i=1nXi\overline{X} = \frac{1}{n} \sum_{i=1}^n X_i is correct and unbiased.

  • Sample variance S~2=1nāˆ‘i=1n(Xiāˆ’Xā€¾)2\tilde{S}^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \overline{X})^2 is biased.

  • Unbiased sample variance S2=nnāˆ’1S~2=1nāˆ’1āˆ‘i=1n(Xiāˆ’Xā€¾)2S^2 = \frac{n}{n-1} \tilde{S}^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \overline{X})^2.

    • X is a random variable, and Xi are observations or samples of this random variable.

  • Unbiased sample (auto-)covariance matrix C=1nāˆ’1āˆ‘i=1n(xiāˆ’xā€¾)(xiāˆ’xā€¾)T\text{C}={1 \over {n-1}}\sum _{{i=1}}^{n}(\mathbf{x}_{i}-\overline{\mathbf{x}})(\mathbf{x}_{i}-\overline {\mathbf{x}})^{{\mathrm{T}}}.

  • Unbiased sample (cross-)covariance matrix C=1nāˆ’1āˆ‘i=1n(xiāˆ’xā€¾)(yiāˆ’yā€¾)T\text{C}={1 \over{n-1}}\sum _{{i=1}}^{n}(\mathbf{x}_{i}-\overline{\mathbf{x}})(\mathbf{y}_{i}-\overline{\mathbf{y}})^{{\mathrm{T}}}.

    • X and Y are vectors of size m. Each entry of the vector is a random variable. Xi and Yi are observations or samples of random vectors X and Y.

    • If the random variables are known in normal distribution, then we can still take 1/n instead of 1/(n-1) for unbiased estimation. This is because sample mean and sample variance are independent for normal distribution.

Applications

Covariance matrix of a set of 3D points

  • In a typical normal estimation algorithm, the normal direction of each point is estimated locally by first computing the covariance matrix of points in the neighborhood (e.g., 20 points), and then taking the eigenvector corresponding to the smallest eigenvalue of the covariance matrix.

  • Let X, Y and Z denote the random variables (of unknown distributions) in x, y, z axes respectively. The set of 3D points are the samples drawn from these three distributions.

  • The covariance matrix in this case is defined as

    C=[var(X)cov(X,Y)cov(X,Z)cov(X,Y)var(Y)cov(Y,Z)cov(X,Z)cov(Y,Z)var(Z)]\text{C} = \begin{bmatrix} \text{var}(X) & \text{cov}(X,Y) & \text{cov}(X,Z) \\ \text{cov}(X,Y) & \text{var}(Y) & \text{cov}(Y,Z) \\ \text{cov}(X,Z) & \text{cov}(Y,Z) & \text{var}(Z) \end{bmatrix}.

  • This is the auto-covariance matrix, meaning that it computes the covariance between the random vector and itself. This matrix by definition is always positive semi-definite.

  • Since auto-covariance matrices are always real symmetric, the singular values (from SVD) are the absolute values of the eigenvalues. Also, since the matrix is positive semi-definite, all eigenvalues are non-negative.

Covariance matrix of two point clouds

  • In the standard point-to-point ICP registration algorithm, the closed form solution is given by performing SVD on the covariance matrix estimated from two subsets of corresponding points in two point clouds respectively. This is the method proposed by Umeyama in 1991.

  • Let (X1, Y1, Z1) be the random vector of point cloud A, and let (X2, Y2, Z2) be the random vector of the other point cloud. Each entry of the vector is a random variable corresponding to the x/y/z axis.

  • The covariance matrix in this case is

    C=[cov(X1,X2)cov(X1,Y2)cov(X1,Z2)cov(Y1,X2)cov(Y1,Y2)cov(Y1,Z2)cov(Z1,X2)cov(Z1,Y2)cov(Z1,Z2)]\text{C} = \begin{bmatrix} \text{cov}(X_1,X_2) & \text{cov}(X_1,Y_2) & \text{cov}(X_1,Z_2) \\ \text{cov}(Y_1,X_2) & \text{cov}(Y_1,Y_2) & \text{cov}(Y_1,Z_2) \\ \text{cov}(Z_1,X_2) & \text{cov}(Z_1,Y_2) & \text{cov}(Z_1,Z_2) \end{bmatrix}

  • This is the cross-covariance matrix, meaning that it computes the covariance between one random vector with the other. This matrix does not guarantee positive semi-definiteness.

  • Suppose that A and B are 3-by-n matrices where each column is a 3D point. The practical way to compute this cross-covariance matrix is C = 1/n * A_demean * B_demean^T, where ^T is the transpose and demean indicates we need to first subtract the mean from all points in A and B.

References

Last updated