Comment on page
Descent Methods
Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is
, where
is the step size and
is the step direction at iteration
. The outline of a general Descent Method is as follows.
given a starting point
.
repeat
1. Determine a descent direction
.
2. Line search. Choose a step size
.
3. Update.
.
until stopping criterion is satisfied.
Different algorithms differ with respect to the choice of
- Descent Direction:
- Gradient Method
- Steepest Descent Method
- Newton Method
- Step Size:
- Fixed Step Size
- Exact Line Search
- Backtracking Line Search
To choose a Descent Direction, a necessary condition is
.
- Steepest Descent Method
- by choosing different norm P we can have different directions.
- Gradient Method
- is a special case of the steepest descent method whenis identity matrix.
- is the steepest in the first order approximation.
- Newton Method
- is a special case of the steepest descent method when
- is the steepest in the second order approximation.
References: EE230 Convex Optimization Lecture Slide Topic 6: Unconstrained Optimization; Convex Optimization notes provided by Dr. S. Boyd.
In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is
.

Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guess
or the root. For example, suppose you need to find the root of
which is near
. Then Newton's method will take you back and forth, or farther away from the root.

function 27x^3 - 3x + 1 = 0
Application in Inverse Kinematics:
- Findsuch thatwhereis the desired point in taskspace that we would like to reach, andis the forward kinematics model of the manipulator.
- In each iteration,whereis the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.
References: https://en.wikipedia.org/wiki/Newton%27s_method; https://brilliant.org/wiki/newton-raphson-method/; Chapter 6 in Modern Robotics textbook
In optimization, Newton's method is applied to the derivative
of a twice-differentiable function
to find the roots of the derivative (solutions to
), also known as the stationary points of
. These solutions may be minima, maxima, or saddle points.
Putting everything together, Newton's method performs the iteration
where
is the gradient and
is the Hessian matrix.
Often Newton's method is modified to include a small step size
instead of
:
. For step sizes other than 1, the method is often referred to as the relaxed or damped Newton's method.
The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.
Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.
Starting with an initial guess
for the minimum, the method proceeds by the iterations
Note that
is the left pseudoinverse of
, and assumption
in the algorithm statement is necessary, as otherwise the matrix
is not invertible and the normal equations cannot be solved uniquely.
Levenberg–Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.
With the Gauss–Newton method, the sum of squares of the residuals
may not decrease at every iteration. However, since
is a descent direction, unless
is a stationary point, it holds that
for all sufficiently small
. Thus, if divergence occurs, one solution is to employ a fraction
of the increment vector
in the updating formula:
. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function
. An optimal value for
can be found by using a line search algorithm.
In cases where the direction of the shift vector is such that the optimal fraction
is close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,
where
is a positive diagonal matrix and
is a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered at
to the quadratic form, resulting in a smaller step. Note that when
is the identity matrix and
, then
, therefore the direction of
approaches the direction of the negative gradient
.
Conclusions:
- The main difference between GN and LM methods is the selection of matrix. One isand the other is.
- When the function can be well approximated by second order expansion, we use the Gauss-Newton method.
- When the approximation to the problem is not good, where the matrixmay not be invertible or under ill condition, we use the Levenberg–Marquardt method.
Last modified 5mo ago