Descent Methods
Last updated
Last updated
Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is , where is the step size and is the step direction at iteration. The outline of a general Descent Method is as follows.
given a starting point . repeat 1. Determine a descent direction . 2. Line search. Choose a step size . 3. Update. . until stopping criterion is satisfied.
Different algorithms differ with respect to the choice of
Descent Direction:
Gradient Method
Steepest Descent Method
Newton Method
Step Size:
Fixed Step Size
Exact Line Search
Backtracking Line Search
To choose a Descent Direction, a necessary condition is.
Steepest Descent Method
by choosing different norm P we can have different directions.
Gradient Method
is a special case of the steepest descent method whenis identity matrix.
is the steepest in the first order approximation.
Newton Method
is a special case of the steepest descent method when
is the steepest in the second order approximation.
References: EE230 Convex Optimization Lecture Slide Topic 6: Unconstrained Optimization; Convex Optimization notes provided by Dr. S. Boyd.
Application in Inverse Kinematics:
References: https://en.wikipedia.org/wiki/Newton%27s_method; https://brilliant.org/wiki/newton-raphson-method/; Chapter 6 in Modern Robotics textbook
Putting everything together, Newton's method performs the iteration
Reference: https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization
The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.
Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.
Reference: https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
Levenberg–Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.
Conclusions:
When the function can be well approximated by second order expansion, we use the Gauss-Newton method.
References: https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm; https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is .
Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guessor the root. For example, suppose you need to find the root of which is near. Then Newton's method will take you back and forth, or farther away from the root.
Find such that where is the desired point in taskspace that we would like to reach, andis the forward kinematics model of the manipulator.
In each iteration, where is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.
In optimization, Newton's method is applied to the derivativeof a twice-differentiable functionto find the roots of the derivative (solutions to), also known as the stationary points of. These solutions may be minima, maxima, or saddle points.
whereis the gradient and is the Hessian matrix.
Often Newton's method is modified to include a small step sizeinstead of :. For step sizes other than 1, the method is often referred to as the relaxed or damped Newton's method.
Starting with an initial guess for the minimum, the method proceeds by the iterations
Note thatis the left pseudoinverse of , and assumption in the algorithm statement is necessary, as otherwise the matrix is not invertible and the normal equations cannot be solved uniquely.
With the Gauss–Newton method, the sum of squares of the residualsmay not decrease at every iteration. However, sinceis a descent direction, unlessis a stationary point, it holds that for all sufficiently small. Thus, if divergence occurs, one solution is to employ a fractionof the increment vectorin the updating formula: . In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function. An optimal value forcan be found by using a line search algorithm.
In cases where the direction of the shift vector is such that the optimal fractionis close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,
whereis a positive diagonal matrix andis a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered at to the quadratic form, resulting in a smaller step. Note that whenis the identity matrix and , then , therefore the direction ofapproaches the direction of the negative gradient .
The main difference between GN and LM methods is the selection of matrix . One is and the other is .
When the approximation to the problem is not good, where the matrix may not be invertible or under ill condition, we use the Levenberg–Marquardt method.