The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.
Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.
Levenberg–Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.
Conclusions:
When the function can be well approximated by second order expansion, we use the Gauss-Newton method.
Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is xk+1​=xk​+tk​Δxk​, where tk​is the step size and Δxk​is the step direction at iterationk. The outline of a general Descent Method is as follows.
given a starting point x.
repeat
1. Determine a descent direction Δx.
2. Line search. Choose a step size t>0.
3. Update. x:=x+tΔx.
until stopping criterion is satisfied.
Descent Direction:Δxk​
Step Size: tk​
To choose a Descent Direction, a necessary condition is∇f(x)TΔx<0.
Steepest Descent Method Δx=−P−1∇f(x)
Gradient Method Δx=−∇f(x)
is a special case of the steepest descent method whenPis identity matrix.
Newton Method Δx=−∇2f(x)−1∇f(x)
is a special case of the steepest descent method whenP=∇2f(x)
In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is xn+1​=xn​−f′(xn​)f(xn​)​.
Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guessx0​or the root. For example, suppose you need to find the root of 27x3−3x+1=0 which is nearx=0. Then Newton's method will take you back and forth, or farther away from the root.
Find θd​ such that xd​−f(θd​)=0where xd​is the desired point in taskspace that we would like to reach, andf(θ)is the forward kinematics model of the manipulator.
In each iteration, θi+1=θi+J−1(θi)(xd​−f(θi)) where J−1(θi)is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.
In optimization, Newton's method is applied to the derivativef′of a twice-differentiable functionfto find the roots of the derivative (solutions tof′(x)=0), also known as the stationary points off. These solutions may be minima, maxima, or saddle points.
where∇f(x)is the gradient and ∇2f(x)=Hf​(x)is the Hessian matrix.
Often Newton's method is modified to include a small step size0<γ≤1instead of γ=1:xk+1​=xk​−γ[f′′(xk​)]−1f′(xk​). For step sizes other than 1, the method is often referred to as the relaxed or damped Newton's method.
Starting with an initial guess β(0)for the minimum, the method proceeds by the iterations
β(k+1)=β(k)−(Jr​TJr​)−1Jr​Tr(β(k))
Note that(Jr​TJr​)−1Jr​Tis the left pseudoinverse of Jr​, and assumption m≥n in the algorithm statement is necessary, as otherwise the matrix Jr​TJr​ is not invertible and the normal equations cannot be solved uniquely.
With the Gauss–Newton method, the sum of squares of the residualsSmay not decrease at every iteration. However, sinceΔis a descent direction, unlessS(βk)is a stationary point, it holds that S(βk+αΔ)<S(βk) for all sufficiently smallα>0. Thus, if divergence occurs, one solution is to employ a fractionαof the increment vectorΔin the updating formula: βk+1=βk+αΔ. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective functionS. An optimal value forαcan be found by using a line search algorithm.
In cases where the direction of the shift vector is such that the optimal fractionαis close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,
(JTJ+λD)Δ=−JTr
whereDis a positive diagonal matrix andλis a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered atΔx=0 to the quadratic form, resulting in a smaller step. Note that whenDis the identity matrix and λ→+∞, then λΔ→−JTr, therefore the direction ofΔapproaches the direction of the negative gradient −JTr.
The main difference between GN and LM methods is the selection of matrix H. One is H=JTJ and the other is H=JTJ+λD.
When the approximation to the problem is not good, where the matrix JTJmay not be invertible or under ill condition, we use the Levenberg–Marquardt method.