# Descent Methods

## Descent Methods

Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is  $$x\_{k+1} = x\_k + t\_k \Delta x\_k$$, where $$t\_k$$is the step size and $$\Delta x\_k$$is the step direction at iteration$$k$$. The outline of a general Descent Method is as follows.

**given** a starting point $$x$$. \
**repeat**\
&#x20;       1\. Determine a descent direction $$\Delta x$$. \
&#x20;       2\. Line search. Choose a step size $$t>0$$. \
&#x20;       3\. Update. $$x:=x+t \Delta x$$. \
**until** stopping criterion is satisfied.

Different algorithms differ with respect to the choice of&#x20;

* Descent Direction:$$\Delta x\_k$$
  * Gradient Method
  * Steepest Descent Method
  * Newton Method
* Step Size: $$t\_k$$
  * Fixed Step Size
  * Exact Line Search
  * Backtracking Line Search

To choose a Descent Direction, a necessary condition is$$\nabla f(x)^T \Delta x < 0$$.

* Steepest Descent Method $$\Delta x = - P^{-1} \nabla f(x)$$
  * by choosing different norm P we can have different directions.
* Gradient Method $$\Delta x = - \nabla f(x)$$
  * is a special case of the steepest descent method when$$P$$is identity matrix.&#x20;
  * is the steepest in the first order approximation.
* Newton Method $$\Delta x = - \nabla^2 f(x)^{-1} \nabla f(x)$$
  * is a special case of the steepest descent method when$$P = \nabla^2 f(x)$$
  * is the steepest in the second order approximation.

References: EE230 Convex Optimization Lecture Slide Topic 6: Unconstrained Optimization; Convex Optimization notes provided by Dr. S. Boyd.&#x20;

## Newton's Method

In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is $$x\_{n+1} = x\_n - \frac{f(x\_n)}{f'(x\_n)}$$.

![](/files/-ML6keQadrwQYQX7Nzgm)

Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guess$$x\_0$$or the root. For example, suppose you need to find the root of $$27x^3 - 3x + 1 = 0$$ which is near$$x = 0$$. Then Newton's method will take you back and forth, or farther away from the root.

![function 27x^3 - 3x + 1 = 0](/files/-ML6ezWnO7G9-NTQFN_I)

Application in Inverse Kinematics:&#x20;

* Find $$\theta\_d$$ such that $$x\_d - f(\theta\_d) = 0$$where $$x\_d$$is the desired point in taskspace that we would like to reach, and$$f(\theta)$$is the forward kinematics model of the manipulator.
* In each iteration, $$\theta^{i+1} = \theta^i + J^{-1}(\theta^i)(x\_d - f(\theta^i))$$ where $$J^{-1}(\theta^i)$$is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.

References: <https://en.wikipedia.org/wiki/Newton%27s_method>; <https://brilliant.org/wiki/newton-raphson-method/>; Chapter 6 in Modern Robotics textbook

## Newton's Method in Optimization

In optimization, Newton's method is applied to the derivative$$f'$$of a twice-differentiable function$$f$$to find the roots of the derivative (solutions to$$f'(x) = 0$$), also known as the stationary points of$$f$$. These solutions may be minima, maxima, or saddle points.

Putting everything together, Newton's method performs the iteration

&#x20;$$x\_{k+1} = x\_{k} - \frac{f'(x\_k)}{f''(x\_k)} = x\_{k}-\[f''(x\_{k})]^{-1}f'(x\_{k}) = x\_k - \[\nabla ^{2}f(x)]^{-1}\nabla f(x),$$

where$$\nabla f(x)$$is the gradient and $$\nabla ^{2}f(x) = H\_f(x)$$is the Hessian matrix.

Often Newton's method is modified to include a small step size$$0<\gamma \leq 1$$instead of $$\gamma=1$$:$$x\_{k+1}=x\_{k}-\gamma \[f''(x\_{k})]^{-1}f'(x\_{k})$$. For step sizes other than 1, the method is often referred to as the relaxed or **damped Newton's method**.

Reference: <https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization>

## Gauss-Newton Algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.&#x20;

Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.

Starting with an initial guess $$\boldsymbol{\beta}^{(0)}$$for the minimum, the method proceeds by the iterations

$$\boldsymbol{\beta}^{(k+1)}= \boldsymbol{\beta}^{(k)}-\left(\mathbf {J\_{r}} ^{\mathsf {T}}\mathbf {J\_{r}} \right)^{-1}\mathbf {J\_{r}} ^{\mathsf {T}}\mathbf {r} \left(\boldsymbol{\beta}^{(k)}\right)$$

Note that$$\left(\mathbf {J\_{r}} ^{\mathsf {T}}\mathbf {J\_{r}} \right)^{-1}\mathbf {J\_{r}} ^{\mathsf {T}}$$is the left pseudoinverse of $$\mathbf {J\_{r}}$$, and assumption $$m \ge n$$ in the algorithm statement is necessary, as otherwise the matrix $$\mathbf {J\_{r}} ^{\mathsf {T}}\mathbf {J\_{r}}$$ is not invertible and the normal equations cannot be solved uniquely.

Reference: <https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm>

## Levenberg–Marquardt Algorithm

Levenberg–Marquardt algorithm (LMA), also known as the **damped** least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.

With the Gauss–Newton method, the sum of squares of the residuals$$S$$may not decrease at every iteration. However, since$$\Delta$$is a descent direction, unless$$S (\boldsymbol {\beta }^{k})$$is a stationary point, it holds that $$S (\boldsymbol{\beta}^{k}+\alpha \Delta ) < S (\boldsymbol{\beta}^{k})$$ for all sufficiently small$$\alpha >0$$. Thus, **if divergence occurs**, one solution is to employ a fraction$$\alpha$$of the increment vector$$\Delta$$in the updating formula: $${\boldsymbol {\beta }}^{k+1}={\boldsymbol {\beta }}^{k}+\alpha \Delta$$. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function$$S$$. An optimal value for$$\alpha$$can be found by using a line search algorithm.

In cases where the direction of the shift vector is such that the optimal fraction$$\alpha$$is close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is **rotated towards the direction of steepest descent**,

$$\left(\mathbf {J^{\mathsf {T}}J+\lambda D} \right)\Delta =-\mathbf {J} ^{\mathsf {T}}\mathbf {r}$$

where$$D$$is a positive diagonal matrix and$$\lambda$$is a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered at$$\Delta x=0$$ to the quadratic form, resulting in a smaller step. Note that when$$D$$is the identity matrix and $$\lambda \to +\infty$$, then $$\lambda \Delta \to -\mathbf {J} ^{\mathsf {T}}\mathbf {r}$$, therefore the direction of$$\Delta$$approaches the direction of the negative gradient $$-\mathbf {J} ^{\mathsf {T}}\mathbf {r}$$.

Conclusions:

* The main difference between GN and LM methods is the selection of matrix $$H$$. One is $$H=\mathbf{J}^{\mathsf{T}} \mathbf{J}$$ and the other is $$H=\mathbf{J}^{\mathsf{T}} \mathbf{J} +\lambda D$$.
* When the function can be well approximated by second order expansion, we use the Gauss-Newton method.&#x20;
* When the approximation to the problem is not good, where the matrix $$\mathbf{J}^{\mathsf{T}} \mathbf{J}$$may not be invertible or under ill condition, we use the Levenberg–Marquardt method.

References:  <https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm>; <https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.hanzheteng.com/math/optimization/descent-methods.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
