# Generalized ICP

### The nature of the problem/algorithm

Classic ICP uses a point-to-point distance metric, which can lead to a non-quadratic error surface, especially when the point clouds are not well aligned initially. This is because the closest point correspondence can change dramatically for different transformations, leading to a discontinuous, and hence non-quadratic, error surface.

GICP, on the other hand, uses a plane-to-plane distance metric, which can result in a smoother and more quadratic-like error surface. This is because the plane-to-plane distance changes more smoothly with the transformation, leading to a more continuous error surface.

The above two statements about two methods are generated by GPT and to be verified by experiments.

Both ICP and GICP establish error terms in the least-square forms (with or without covariance matrix as the weight), and this seems to fit GN and LM use cases. However, in practice, despite least-square error terms, the point cloud registration problem can still be highly non-linear, due to the nature of transformation matrix (in SE(3) space), and how the correspondence points are established in each iteration. Therefore, in practice, BFGS may run more robustly than GN and LM.

### How to choose optimizer? Comparison between GN, LM, and BFGS.

Gauss-Newton (GN): It assumes that the error surface is quadratic, which allows it to solve directly for the step that minimizes the error.

Levenberg-Marquardt (LM): This is a modification of Gauss-Newton that introduces a damping factor, making it more robust to non-quadratic error surfaces. When the damping factor is high, LM behaves more like gradient descent, taking small, reliable steps. When the damping factor is low, LM behaves more like Gauss-Newton, taking large, confident steps. This adaptability makes LM more reliable than Gauss-Newton in practice.

Although both GN and LM are iterative methods for nonlinear problem, at each iteration of the state, they approximate the model using first-order Taylor expansion (i.e. linearization), then the linearized Jacobian-residual equation can be solved by a linear solver in the form of Ax=b.

Even in the colored point cloud registration algorithm, the color term can also be formulated as linearized, local approximation and solved by GN method.

BFGS: This is a quasi-Newton method that approximates the Hessian matrix using gradient information instead of computing it directly. This makes it more efficient than Gauss-Newton and LM for problems

*with a large number of parameters*(e.g., quadrotor motion planning). BFGS also includes a line search to find the optimal step, which can make it more robust to non-quadratic error surfaces.This is a general unconstrained nonlinear optimization method that does not assume least-square error terms. It does not linearize the model (which contains first-order info only), but instead makes use of the second order information (i.e. curvature, encoded in the approximated Hessian matrix).

### Implementation Details

PCL GICP algorithm uses BFGS optimizer to perform unconstrained nonlinear optimization. It represent state in 6-by-1 vector (XYZ, RPY) as the compact form, and restores to transformation matrix when needed.

The gradient is computed separately for XYZ and RPY. For XYZ, it computes the distance residual between corresponding points, multiplied by the weights from pre-computed covariance matrices (in `df`

and `fdf`

functions). For RPY, it expresses the rotation matrix R as ZYX Euler angles, and then takes the derivative of the elements of the rotation matrix with respect to the Euler angles (in `computeRDerivative`

function.)

Essentially, the distance metric used in plane-to-plane ICP or GICP is the Mahalanobis distance between two distributions.

### Distance Metric

The standard Mahalanobis distance is used to measure point-to-distribution distance. It is formulated as the following:

$D(\mathbf{p}) = \sqrt{(\mathbf{p} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{p} - \boldsymbol{\mu})}$

Intuitively, Mahalanobis distance is the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.

If we extend this concept to measure the distance between two distributions (such as two planes), the formula becomes the following:

$D(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2) = \sqrt{(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2)^\top (\Sigma_1 + \Sigma_2)^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2)}$

This is to measure the residual between two mean points, weighted by the sum of their respective covariance matrices.

If we take into account that one of the point cloud will be transformed at each iteration, so as the covariance matrix of that point cloud, then the measure becomes:

$D(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2') = \sqrt{(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2')^\top (\Sigma_1 + T \Sigma_2 T^\top)^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2')}$

This is exactly what the GICP algorithm is designed to minimize (Eq. 2 from the original paper):

$T = \underset{T}{\arg\min} \sum_i d_i^{(T)^\top} \left( C_i^B + T C_i^A T^\top \right)^{-1} d_i^{(T)}$.

#### Fast-GICP Code Snippet

#### PCL Code Snippet

References: ChatGPT, https://en.wikipedia.org/wiki/Mahalanobis_distance, Fast-GICP loss function, PCL GICP implementation.

### Use Point Normals to Compute Covariances

Since GICP algorithm will regularize the covariance matrices by (0.001, 1, 1) diagonal matrix, where the only information retained herein is the normal direction, it is possible to represent this covariance matrix by its point normal, and reconstruct this regularized covariance back at a later time.

To this end, we can estimate the point normals using the same method as estimating these covariance matrices, and then save this normal information in each point itself.

#### PCL Normal Estimation

In fact, the normal estimation algorithm will need to first estimate a covariance matrix (using exactly the same way as GICP), and then perform eigenvalue decomposition on this covariance matrix, to obtain its smallest eigenvalue as the normal direction. Lastly, this algorithm may flip the normal to point toward viewpoint (or sensor origin, in most cases).

#### GICP Covariance Estimation

GICP algorithm will estimate covariance matrices as usual, and then regularize the covariance matrix by (0.001, 1, 1) diagonal matrix. Two points from source and target clouds, along with their regularized covariance matrices, will be used to compute the Mahalanobis distance between each other, and contribute to the overall optimization problem as an error term.

#### Reconstruct Covariances from Normals

There are difference ways to reconstruct this. Below we compared a few methods (using Python code equivalent to original C++ implementation) and the results (in terms of Mahalanobis distance) are identical, or numerically the same.

### Convergence Condition

The related code snippets are:

This is to compute the difference in the transformation matrix, represented by `delta`

variable.

The algorithm is deemed converged, if no element in the rotation matrix has a difference (computed against last iteration) greater than the `rotation_epsilon_`

, and no element in the translation vector has a difference greater than the `transformation_epsilon_`

.

`transformation_epsilon_`

can be matched to meters directly, i.e. 5e-4 means no difference greater than 0.0005 meter.

`rotation_epsilon_`

corresponds to the max difference in the elements of the 3x3 rotation matrix. For example, if a rotation matrix represents a z-axis rotation of alpha degree, then the converge condition is

rotation_epsilon_ = sin(alpha * M_PI / 180), or

alpha = arcsin(rotation_epsilon_) * 180 / M_PI.

When rotation_epsilon_ = 2e-3, the corresponding angle in degree is 0.11 degree. Other values can be inferred accordingly.

### Math in Optimization

Objective function: $T = \underset{T}{\arg\min} \sum_i d_i^{(T)^\top} \left( C_i^B + T C_i^A T^\top \right)^{-1} d_i^{(T)}$.

We basically omit the derivative of the Mahalanobis matrix in the middle, and simplify it as a constant matrix denoted by M. We substitute d with its original definition into the objective function and obtain the following.

After rearrangement: $f_i(\mathbf{R}, \mathbf{t}) = (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t})^{\top} \mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}).$

Derivative of R: on: $\frac{\mathrm{d} f_i(\mathbf{R}, \mathbf{t})}{\mathrm{d} \mathbf{R}} = 2 (- \mathbf{q}_i) \times \left(\mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}) \right)^{\top}.$

Derivative of t: on: $\frac{\mathrm{d} f_i(\mathbf{R}, \mathbf{t})}{\mathrm{d} \mathbf{t}} = - 2 \mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}).$

The above two derivatives are consistent with the PCL GICP implementation in C++.

See the math notes of GICP on overleaf for more information and derivations.

### Implementations

Original GICP authors: https://github.com/avsegal/gicp/blob/master/bfgs_funcs.cpp

### Parameterization of Pose

Linearization of transformation matrix

ZYX Euler angle for rotation

Quaternion for rotation

Cayley–Rodrigues Parameters (in the plane adjustment paper)

Last updated