The Gaussâ€“Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.

Unlike Newton's method, the Gaussâ€“Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.

Levenbergâ€“Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gaussâ€“Newton algorithm (GNA) and the method of gradient descent.

Conclusions:

When the function can be well approximated by second order expansion, we use the Gauss-Newton method.

Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is $x_{k+1} = x_k + t_k \Delta x_k$, where $t_k$is the step size and $\Delta x_k$is the step direction at iteration$k$. The outline of a general Descent Method is as follows.

given a starting point $x$.
repeat
1. Determine a descent direction $\Delta x$.
2. Line search. Choose a step size $t>0$.
3. Update. $x:=x+t \Delta x$.
until stopping criterion is satisfied.

Descent Direction:$\Delta x_k$

Step Size: $t_k$

To choose a Descent Direction, a necessary condition is$\nabla f(x)^T \Delta x < 0$.

Steepest Descent Method $\Delta x = - P^{-1} \nabla f(x)$

Gradient Method $\Delta x = - \nabla f(x)$

is a special case of the steepest descent method when$P$is identity matrix.

Newton Method $\Delta x = - \nabla^2 f(x)^{-1} \nabla f(x)$

is a special case of the steepest descent method when$P = \nabla^2 f(x)$

In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is $x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$.

Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guess$x_0$or the root. For example, suppose you need to find the root of $27x^3 - 3x + 1 = 0$ which is near$x = 0$. Then Newton's method will take you back and forth, or farther away from the root.

Find $\theta_d$ such that $x_d - f(\theta_d) = 0$where $x_d$is the desired point in taskspace that we would like to reach, and$f(\theta)$is the forward kinematics model of the manipulator.

In each iteration, $\theta^{i+1} = \theta^i + J^{-1}(\theta^i)(x_d - f(\theta^i))$ where $J^{-1}(\theta^i)$is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.

In optimization, Newton's method is applied to the derivative$f'$of a twice-differentiable function$f$to find the roots of the derivative (solutions to$f'(x) = 0$), also known as the stationary points of$f$. These solutions may be minima, maxima, or saddle points.

where$\nabla f(x)$is the gradient and $\nabla ^{2}f(x) = H_f(x)$is the Hessian matrix.

Often Newton's method is modified to include a small step size$0<\gamma \leq 1$instead of $\gamma=1$:$x_{k+1}=x_{k}-\gamma [f''(x_{k})]^{-1}f'(x_{k})$. For step sizes other than 1, the method is often referred to as the relaxed or damped Newton's method.

Starting with an initial guess $\boldsymbol{\beta}^{(0)}$for the minimum, the method proceeds by the iterations

Note that$\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}$is the left pseudoinverse of $\mathbf {J_{r}}$, and assumption $m \ge n$ in the algorithm statement is necessary, as otherwise the matrix $\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}}$ is not invertible and the normal equations cannot be solved uniquely.

With the Gaussâ€“Newton method, the sum of squares of the residuals$S$may not decrease at every iteration. However, since$\Delta$is a descent direction, unless$S (\boldsymbol {\beta }^{k})$is a stationary point, it holds that $S (\boldsymbol{\beta}^{k}+\alpha \Delta ) < S (\boldsymbol{\beta}^{k})$ for all sufficiently small$\alpha >0$. Thus, if divergence occurs, one solution is to employ a fraction$\alpha$of the increment vector$\Delta$in the updating formula: ${\boldsymbol {\beta }}^{k+1}={\boldsymbol {\beta }}^{k}+\alpha \Delta$. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function$S$. An optimal value for$\alpha$can be found by using a line search algorithm.

In cases where the direction of the shift vector is such that the optimal fraction$\alpha$is close to zero, an alternative method for handling divergence is the use of the Levenbergâ€“Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,

where$D$is a positive diagonal matrix and$\lambda$is a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered at$\Delta x=0$ to the quadratic form, resulting in a smaller step. Note that when$D$is the identity matrix and $\lambda \to +\infty$, then $\lambda \Delta \to -\mathbf {J} ^{\mathsf {T}}\mathbf {r}$, therefore the direction of$\Delta$approaches the direction of the negative gradient $-\mathbf {J} ^{\mathsf {T}}\mathbf {r}$.

The main difference between GN and LM methods is the selection of matrix $H$. One is $H=\mathbf{J}^{\mathsf{T}} \mathbf{J}$ and the other is $H=\mathbf{J}^{\mathsf{T}} \mathbf{J} +\lambda D$.

When the approximation to the problem is not good, where the matrix $\mathbf{J}^{\mathsf{T}} \mathbf{J}$may not be invertible or under ill condition, we use the Levenbergâ€“Marquardt method.