Generalized ICP

The nature of the problem/algorithm

  • Classic ICP uses a point-to-point distance metric, which can lead to a non-quadratic error surface, especially when the point clouds are not well aligned initially. This is because the closest point correspondence can change dramatically for different transformations, leading to a discontinuous, and hence non-quadratic, error surface.
  • GICP, on the other hand, uses a plane-to-plane distance metric, which can result in a smoother and more quadratic-like error surface. This is because the plane-to-plane distance changes more smoothly with the transformation, leading to a more continuous error surface.
    • The above two statements about two methods are generated by GPT and to be verified by experiments.
  • Both ICP and GICP establish error terms in the least-square forms (with or without covariance matrix as the weight), and this seems to fit GN and LM use cases. However, in practice, despite least-square error terms, the point cloud registration problem can still be highly non-linear, due to the nature of transformation matrix (in SE(3) space), and how the correspondence points are established in each iteration. Therefore, in practice, BFGS may run more robustly than GN and LM.

How to choose optimizer? Comparison between GN, LM, and BFGS.

  • Gauss-Newton (GN): It assumes that the error surface is quadratic, which allows it to solve directly for the step that minimizes the error.
  • Levenberg-Marquardt (LM): This is a modification of Gauss-Newton that introduces a damping factor, making it more robust to non-quadratic error surfaces. When the damping factor is high, LM behaves more like gradient descent, taking small, reliable steps. When the damping factor is low, LM behaves more like Gauss-Newton, taking large, confident steps. This adaptability makes LM more reliable than Gauss-Newton in practice.
    • Although both GN and LM are iterative methods for nonlinear problem, at each iteration of the state, they approximate the model using first-order Taylor expansion (i.e. linearization), then the linearized Jacobian-residual equation can be solved by a linear solver in the form of Ax=b.
    • Even in the colored point cloud registration algorithm, the color term can also be formulated as linearized, local approximation and solved by GN method.
  • BFGS: This is a quasi-Newton method that approximates the Hessian matrix using gradient information instead of computing it directly. This makes it more efficient than Gauss-Newton and LM for problems with a large number of parameters (e.g., quadrotor motion planning). BFGS also includes a line search to find the optimal step, which can make it more robust to non-quadratic error surfaces.
    • This is a general unconstrained nonlinear optimization method that does not assume least-square error terms. It does not linearize the model (which contains first-order info only), but instead makes use of the second order information (i.e. curvature, encoded in the approximated Hessian matrix).

Implementation Details

PCL GICP algorithm uses BFGS optimizer to perform unconstrained nonlinear optimization. It represent state in 6-by-1 vector (XYZ, RPY) as the compact form, and restores to transformation matrix when needed.
The gradient is computed separately for XYZ and RPY. For XYZ, it computes the distance residual between corresponding points, multiplied by the weights from pre-computed covariance matrices (in df and fdf functions). For RPY, it expresses the rotation matrix R as ZYX Euler angles, and then takes the derivative of the elements of the rotation matrix with respect to the Euler angles (in computeRDerivative function.)
Essentially, the distance metric used in plane-to-plane ICP or GICP is the Mahalanobis distance between two distributions.

Distance Metric

The standard Mahalanobis distance is used to measure point-to-distribution distance. It is formulated as the following:
D(p)=(pμ)Σ1(pμ)D(\mathbf{p}) = \sqrt{(\mathbf{p} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{p} - \boldsymbol{\mu})}
Intuitively, Mahalanobis distance is the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.
If we extend this concept to measure the distance between two distributions (such as two planes), the formula becomes the following:
D(μ1,μ2)=(μ1μ2)(Σ1+Σ2)1(μ1μ2)D(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2) = \sqrt{(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2)^\top (\Sigma_1 + \Sigma_2)^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2)}
This is to measure the residual between two mean points, weighted by the sum of their respective covariance matrices.
If we take into account that one of the point cloud will be transformed at each iteration, so as the covariance matrix of that point cloud, then the measure becomes:
D(μ1,μ2)=(μ1μ2)(Σ1+TΣ2T)1(μ1μ2)D(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2') = \sqrt{(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2')^\top (\Sigma_1 + T \Sigma_2 T^\top)^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2')}
This is exactly what the GICP algorithm is designed to minimize (Eq. 2 from the original paper):
T=argminTidi(T)(CiB+TCiAT)1di(T)T = \underset{T}{\arg\min} \sum_i d_i^{(T)^\top} \left( C_i^B + T C_i^A T^\top \right)^{-1} d_i^{(T)}

Fast-GICP Code Snippet

double gicp_loss(const Eigen::Vector3d& mean_A, const Eigen::Matrix3d& cov_A, const Eigen::Vector3d& mean_B, const Eigen::Matrix3d& cov_B, const Eigen::Matrix3d& R, const Eigen::Vector3d& t) {
Eigen::Vector3d d = mean_B - (R * mean_A + t);
Eigen::Matrix3d RCR = cov_B + R * cov_A * R.transpose();
Eigen::Matrix3d RCR_inv = RCR.inverse();
Eigen::Vector3d RCRd = RCR_inv * d;
double loss =;

PCL Code Snippet

// PCL registration/impl/gicp.hpp
// in function computeTransformation()
if (nn_dist_sq[0] < dist_threshold) {
Eigen::Matrix3d& C1 = (*input_covariances_)[i];
Eigen::Matrix3d& C2 = (*target_covariances_)[nn_indices[0]];
Eigen::Matrix3d& M = mahalanobis_[i];
// M = R*C1
M = R * C1;
// temp = M*R' + C2 = R*C1*R' + C2
Eigen::Matrix3d temp = M * R.transpose();
temp += C2;
// M = temp^-1
M = temp.inverse();
source_indices[i] = static_cast<int>(i);
target_indices[i] = nn_indices[0];
// in function df() or fdf()
for (int i = 0; i < m; ++i) {
// The last coordinate, p_src[3] is guaranteed to be set to 1.0 in registration.hpp
Vector4fMapConst p_src = gicp_->tmp_src_->points[(*gicp_->tmp_idx_src_)[i]].getVector4fMap();
// The last coordinate, p_tgt[3] is guaranteed to be set to 1.0 in registration.hpp
Vector4fMapConst p_tgt = gicp_->tmp_tgt_->points[(*gicp_->tmp_idx_tgt_)[i]].getVector4fMap();
Eigen::Vector4f pp(transformation_matrix * p_src);
// The last coordinate is still guaranteed to be set to 1.0
Eigen::Vector3d res(pp[0] - p_tgt[0], pp[1] - p_tgt[1], pp[2] - p_tgt[2]);
// temp = M*res
Eigen::Vector3d temp(gicp_->mahalanobis((*gicp_->tmp_idx_src_)[i]) * res);
// Increment total error
f += double(res.transpose() * temp);

Use Point Normals to Compute Covariances

Since GICP algorithm will regularize the covariance matrices by (0.001, 1, 1) diagonal matrix, where the only information retained herein is the normal direction, it is possible to represent this covariance matrix by its point normal, and reconstruct this regularized covariance back at a later time.
To this end, we can estimate the point normals using the same method as estimating these covariance matrices, and then save this normal information in each point itself.

PCL Normal Estimation

In fact, the normal estimation algorithm will need to first estimate a covariance matrix (using exactly the same way as GICP), and then perform eigenvalue decomposition on this covariance matrix, to obtain its smallest eigenvalue as the normal direction. Lastly, this algorithm may flip the normal to point toward viewpoint (or sensor origin, in most cases).

GICP Covariance Estimation

GICP algorithm will estimate covariance matrices as usual, and then regularize the covariance matrix by (0.001, 1, 1) diagonal matrix. Two points from source and target clouds, along with their regularized covariance matrices, will be used to compute the Mahalanobis distance between each other, and contribute to the overall optimization problem as an error term.

Reconstruct Covariances from Normals

There are difference ways to reconstruct this. Below we compared a few methods (using Python code equivalent to original C++ implementation) and the results (in terms of Mahalanobis distance) are identical, or numerically the same.
import numpy as np
# Function to generate points on a plane
def generate_plane_points(mean, normal, n=20, noise_level=0.001):
points = np.random.rand(n, 3)
for i in range(n):
points[i] = points[i] -[i] - mean, normal) * normal
points[i] += np.random.normal(scale=noise_level, size=3)
return points
# Function to calculate Mahalanobis distance
def mahalanobis_distance(mu1, mu2, sigma1, sigma2):
cov_inv = np.linalg.inv(sigma1 + sigma2)
delta = mu1 - mu2
return np.sqrt(, cov_inv), delta))
# Function to estimate plane normal
def estimate_plane_normal(points):
covariance_matrix = np.cov(points, rowvar=False)
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)
# The normal is the eigenvector corresponding to the smallest eigenvalue
normal = eigenvectors[:, np.argmin(eigenvalues)]
return normal
# Regularize covariance matrix
def regularize_covariance_matrix(cov, gicp_epsilon=0.001):
U, _, _ = np.linalg.svd(cov)
S = np.array([1, 1, gicp_epsilon]) # Modified singular values
cov_modified =,, U.T))
return cov_modified
# Reconstruct regularized covariance matrix from normal
def reconstruct_covariance_from_normal(normal, gicp_epsilon=0.001):
# Ensure the normal is a unit vector
normal = normal / np.linalg.norm(normal)
# Find two vectors perpendicular to the normal
if (normal[0] == 0) and (normal[1] == 0):
vec1 = np.array([1, 0, 0])
vec1 = np.array([-normal[1], normal[0], 0])
vec1 = vec1 / np.linalg.norm(vec1)
vec2 = np.cross(normal, vec1)
vec2 = vec2 / np.linalg.norm(vec2)
# Form an orthonormal basis
U = np.column_stack([vec1, vec2, normal])
# Apply regularization to the singular values
S = np.array([1, 1, gicp_epsilon]) # Regularized singular values
# Reconstruct the covariance matrix
cov_modified =,, U.T))
return cov_modified
# LOCUS method to reconstruct covariance matrix from normal
def locus_two_plane_vectors_from_normal(normal):
normal_eig = normal / np.linalg.norm(normal)
if abs(normal_eig[2]) < 1e-7:
normal_eig[2] = 1e-7
vec1 = np.array([1.0, 0.0, -normal_eig[0] / normal_eig[2]])
vec1 = vec1 / np.linalg.norm(vec1)
vec2 = np.cross(normal_eig, vec1)
vec2 = vec2 / np.linalg.norm(vec2)
return 0.001 * np.outer(normal_eig, normal_eig) + np.outer(vec1, vec1) + np.outer(vec2, vec2)
# Define two planes
mean1 = np.array([0, 0, 0])
normal1 = np.array([1, 0, 0])
plane1 = generate_plane_points(mean1, normal1)
mean2 = np.array([1, 0, 0])
normal2 = np.array([-1, 0, 0])
plane2 = generate_plane_points(mean2, normal2)
# Calculate means and covariances
mu1 = np.mean(plane1, axis=0)
sigma1 = np.cov(plane1, rowvar=False)
mu2 = np.mean(plane2, axis=0)
sigma2 = np.cov(plane2, rowvar=False)
# Estimate normals using eigenvalue decomposition
normal1_estimated = estimate_plane_normal(plane1)
normal2_estimated = estimate_plane_normal(plane2)
# Calculate Mahalanobis distance before transformation
distance_before = mahalanobis_distance(mu1, mu2, sigma1, sigma2)
print("Mahalanobis distance before transformation: {:.3f}".format(distance_before))
# Regularize covariance matrices and recalculate Mahalanobis distance
sigma1_regular = regularize_covariance_matrix(sigma1)
sigma2_regular = regularize_covariance_matrix(sigma2)
distance_before_regular = mahalanobis_distance(mu1, mu2, sigma1_regular, sigma2_regular)
print("Mahalanobis distance before transformation (regularized): {:.3f}".format(distance_before_regular))
# Reconstruct covariance matrices from normals and recalculate Mahalanobis distance
sigma1_reconstructed = reconstruct_covariance_from_normal(normal1_estimated)
sigma2_reconstructed = reconstruct_covariance_from_normal(normal2_estimated)
distance_before_reconstructed = mahalanobis_distance(mu1, mu2, sigma1_reconstructed, sigma2_reconstructed)
print("Mahalanobis distance before transformation (reconstructed): {:.3f}".format(distance_before_reconstructed))
# Reconstruct covariance matrices from normals using LOCUS method and recalculate Mahalanobis distance
sigma1_reconstructed_locus = locus_two_plane_vectors_from_normal(normal1_estimated)
sigma2_reconstructed_locus = locus_two_plane_vectors_from_normal(normal2_estimated)
distance_before_reconstructed_locus = mahalanobis_distance(mu1, mu2, sigma1_reconstructed_locus, sigma2_reconstructed_locus)
print("Mahalanobis distance before transformation (reconstructed using LOCUS method): {:.3f}".format(distance_before_reconstructed_locus))

Convergence Condition

The related code snippets are:
// in gicp.hpp, function computeTransformation()
previous_transformation_ = transformation_;
try {
output, source_indices, *target_, target_indices, transformation_);
/* compute the delta from this iteration */
delta = 0.;
for (int k = 0; k < 4; k++) {
for (int l = 0; l < 4; l++) {
double ratio = 1;
if (k < 3 && l < 3) { // rotation part of the transform
ratio = 1. / rotation_epsilon_;
} else {
ratio = 1. / transformation_epsilon_;
double c_delta = ratio *
fabs(previous_transformation_(k, l) - transformation_(k, l));
if (c_delta > delta) {
delta = c_delta;
if (nr_iterations_ >= max_iterations_ || delta < 1) {
converged_ = true;
// default values in initialization
transformation_epsilon_ = 5e-4;
This is to compute the difference in the transformation matrix, represented by delta variable.
The algorithm is deemed converged, if no element in the rotation matrix has a difference (computed against last iteration) greater than the rotation_epsilon_, and no element in the translation vector has a difference greater than the transformation_epsilon_.
transformation_epsilon_ can be matched to meters directly, i.e. 5e-4 means no difference greater than 0.0005 meter.
rotation_epsilon_ corresponds to the max difference in the elements of the 3x3 rotation matrix. For example, if a rotation matrix represents a z-axis rotation of alpha degree, then the converge condition is
rotation_epsilon_ = sin(alpha * M_PI / 180), or
alpha = arcsin(rotation_epsilon_) * 180 / M_PI.
When rotation_epsilon_ = 2e-3, the corresponding angle in degree is 0.11 degree. Other values can be inferred accordingly.

Math in Optimization

  • Objective function:
    T=argminTidi(T)(CiB+TCiAT)1di(T)T = \underset{T}{\arg\min} \sum_i d_i^{(T)^\top} \left( C_i^B + T C_i^A T^\top \right)^{-1} d_i^{(T)}
  • We basically omit the derivative of the Mahalanobis matrix in the middle, and simplify it as a constant matrix denoted by M. We substitute d with its original definition into the objective function and obtain the following.
  • After rearrangement:
    fi(R,t)=(piRqit)M(piRqit).f_i(\mathbf{R}, \mathbf{t}) = (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t})^{\top} \mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}).
  • Derivative of R: on:
    dfi(R,t)dR=2(qi)×(M(piRqit)).\frac{\mathrm{d} f_i(\mathbf{R}, \mathbf{t})}{\mathrm{d} \mathbf{R}} = 2 (- \mathbf{q}_i) \times \left(\mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}) \right)^{\top}.
  • Derivative of t: on:
    dfi(R,t)dt=2M(piRqit).\frac{\mathrm{d} f_i(\mathbf{R}, \mathbf{t})}{\mathrm{d} \mathbf{t}} = - 2 \mathbf{M} (\mathbf{p}_i - \mathbf{R} \mathbf{q}_i - \mathbf{t}).
  • The above two derivatives are consistent with the PCL GICP implementation in C++.
  • See the math notes of GICP on overleaf for more information and derivations.


Parameterization of Pose

  • Linearization of transformation matrix
  • ZYX Euler angle for rotation
  • Quaternion for rotation
  • Cayley–Rodrigues Parameters (in the plane adjustment paper)