Comment on page

# Descent Methods

Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is

$x_{k+1} = x_k + t_k \Delta x_k$

, where $t_k$

is the step size and $\Delta x_k$

is the step direction at iteration$k$

. The outline of a general Descent Method is as follows.**given**a starting point

$x$

.
**repeat**1. Determine a descent direction

$\Delta x$

.
2. Line search. Choose a step size $t>0$

.
3. Update. $x:=x+t \Delta x$

.
**until**stopping criterion is satisfied.

Different algorithms differ with respect to the choice of

- Descent Direction:$\Delta x_k$
- Gradient Method
- Steepest Descent Method
- Newton Method

- Step Size:$t_k$
- Fixed Step Size
- Exact Line Search
- Backtracking Line Search

To choose a Descent Direction, a necessary condition is

$\nabla f(x)^T \Delta x < 0$

.- Steepest Descent Method$\Delta x = - P^{-1} \nabla f(x)$
- by choosing different norm P we can have different directions.

- Gradient Method$\Delta x = - \nabla f(x)$
- is a special case of the steepest descent method when$P$is identity matrix.
- is the steepest in the first order approximation.

- Newton Method$\Delta x = - \nabla^2 f(x)^{-1} \nabla f(x)$
- is a special case of the steepest descent method when$P = \nabla^2 f(x)$
- is the steepest in the second order approximation.

References: EE230 Convex Optimization Lecture Slide Topic 6: Unconstrained Optimization; Convex Optimization notes provided by Dr. S. Boyd.

In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is

$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$

.Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guess

$x_0$

or the root. For example, suppose you need to find the root of $27x^3 - 3x + 1 = 0$

which is near$x = 0$

. Then Newton's method will take you back and forth, or farther away from the root.function 27x^3 - 3x + 1 = 0

Application in Inverse Kinematics:

- Find$\theta_d$such that$x_d - f(\theta_d) = 0$where$x_d$is the desired point in taskspace that we would like to reach, and$f(\theta)$is the forward kinematics model of the manipulator.
- In each iteration,$\theta^{i+1} = \theta^i + J^{-1}(\theta^i)(x_d - f(\theta^i))$where$J^{-1}(\theta^i)$is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.

References: https://en.wikipedia.org/wiki/Newton%27s_method; https://brilliant.org/wiki/newton-raphson-method/; Chapter 6 in Modern Robotics textbook

In optimization, Newton's method is applied to the derivative

$f'$

of a twice-differentiable function$f$

to find the roots of the derivative (solutions to$f'(x) = 0$

), also known as the stationary points of$f$

. These solutions may be minima, maxima, or saddle points.Putting everything together, Newton's method performs the iteration

$x_{k+1} = x_{k} - \frac{f'(x_k)}{f''(x_k)} = x_{k}-[f''(x_{k})]^{-1}f'(x_{k}) = x_k - [\nabla ^{2}f(x)]^{-1}\nabla f(x),$

where

$\nabla f(x)$

is the gradient and $\nabla ^{2}f(x) = H_f(x)$

is the Hessian matrix.Often Newton's method is modified to include a small step size

$0<\gamma \leq 1$

instead of $\gamma=1$

:$x_{k+1}=x_{k}-\gamma [f''(x_{k})]^{-1}f'(x_{k})$

. For step sizes other than 1, the method is often referred to as the relaxed or **damped Newton's method**.The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.

Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.

Starting with an initial guess

$\boldsymbol{\beta}^{(0)}$

for the minimum, the method proceeds by the iterations

$\boldsymbol{\beta}^{(k+1)}= \boldsymbol{\beta}^{(k)}-\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {r} \left(\boldsymbol{\beta}^{(k)}\right)$

Note that

$\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}$

is the left pseudoinverse of $\mathbf {J_{r}}$

, and assumption $m \ge n$

in the algorithm statement is necessary, as otherwise the matrix $\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}}$

is not invertible and the normal equations cannot be solved uniquely.Levenberg–Marquardt algorithm (LMA), also known as the

**damped**least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.With the Gauss–Newton method, the sum of squares of the residuals

$S$

may not decrease at every iteration. However, since$\Delta$

is a descent direction, unless$S (\boldsymbol {\beta }^{k})$

is a stationary point, it holds that $S (\boldsymbol{\beta}^{k}+\alpha \Delta ) < S (\boldsymbol{\beta}^{k})$

for all sufficiently small$\alpha >0$

. Thus, **if divergence occurs**, one solution is to employ a fraction$\alpha$

of the increment vector$\Delta$

in the updating formula: ${\boldsymbol {\beta }}^{k+1}={\boldsymbol {\beta }}^{k}+\alpha \Delta$

. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function$S$

. An optimal value for$\alpha$

can be found by using a line search algorithm.In cases where the direction of the shift vector is such that the optimal fraction

$\alpha$

is close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is **rotated towards the direction of steepest descent**,

$\left(\mathbf {J^{\mathsf {T}}J+\lambda D} \right)\Delta =-\mathbf {J} ^{\mathsf {T}}\mathbf {r}$

where

$D$

is a positive diagonal matrix and$\lambda$

is a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered at$\Delta x=0$

to the quadratic form, resulting in a smaller step. Note that when$D$

is the identity matrix and $\lambda \to +\infty$

, then $\lambda \Delta \to -\mathbf {J} ^{\mathsf {T}}\mathbf {r}$

, therefore the direction of$\Delta$

approaches the direction of the negative gradient $-\mathbf {J} ^{\mathsf {T}}\mathbf {r}$

.Conclusions:

- The main difference between GN and LM methods is the selection of matrix$H$. One is$H=\mathbf{J}^{\mathsf{T}} \mathbf{J}$and the other is$H=\mathbf{J}^{\mathsf{T}} \mathbf{J} +\lambda D$.
- When the function can be well approximated by second order expansion, we use the Gauss-Newton method.
- When the approximation to the problem is not good, where the matrix$\mathbf{J}^{\mathsf{T}} \mathbf{J}$may not be invertible or under ill condition, we use the Levenberg–Marquardt method.

Last modified 5mo ago