📖
Wiki
Back to my personal website
  • Home
  • Equipment and Devices
    • 3D Printer
    • Laser Cutter
    • Motion Capture System
    • Sensors
      • RGB-D Cameras
      • Velodyne LiDAR
      • Zed Camera
      • RealSense D435i
      • IMU
    • eGPU
    • Nvidia AGX Xavier
    • CPU Benchmark
    • Installation Checklist
  • Development
    • Linux
      • Shell
      • GDB
      • Git
      • Tmux
      • Network
      • Tricks
      • Debug FAQ
    • CMake
      • Catkin Tools
      • CMakeLists
      • CMake Variables
      • CMake Commands
      • CMake: find_package()
    • ROS
      • Gazebo
      • wstool
      • roslaunch
      • rosbag
      • multi-threaded spinner
    • ROS2
      • Convert ROS1 bag to ROS2 bag
    • C++
      • C++ 11
      • C++ Examples
      • C++ Debug
      • Factory Method
      • Timing
    • Google Tools
      • GLog
      • GFlags
      • GTest
      • Style Guide
      • Clang Format
    • PCL
      • Point Type
      • Methods
      • Architecture
      • Code Explained
    • Open3D
      • Python API
      • Registration
      • Visualization
      • Tools
    • OpenCV
      • Documentation
      • Modules
    • Other Libraries
      • Eigen
      • Ceres
      • g2o
      • GTSAM
    • Website
  • Algorithm
    • SLAM
      • K-D Tree
      • Octree
      • Bag of Words
      • Distance Measures
      • Coordinate Systems
      • LOAM
      • Iterative Closest Point
      • Generalized ICP
      • Mahalanobis Distance
    • Computer Science
      • Computational Model
      • Sorting
      • Analysis
      • Complexity Classes (P, NP)
      • Divide and Conquer
      • Greedy Algorithm
      • Dynamic Programming
      • Tree
      • Graph
    • Computer Vision
      • Camera Models
      • Distortion
      • Motion Models
      • Shutter
      • Image Sensors
      • Epipolar Geometry
      • Multiple-View Geometry
    • Datasets
      • RGB-D Datasets
      • Point Cloud Datasets
      • LiDAR SLAM Datasets
  • Math
    • Optimization
      • Convex Optimization
      • Descent Methods
    • Probability
      • Moment
      • Covariance Matrix
      • Stochastic Process
    • Topology
      • References
      • Concepts
      • Topological Spaces
      • Representation of Rotations
      • Representation of 3-sphere
    • Algebra
      • Linear Algebra
      • Matrix Factorization
      • Condition Number
      • Matrix Lie Group
    • Differential Geometry
      • Manifold
      • Submanifold
      • Quotient Manifolds
      • Tangent Space
  • Quadrotor
    • PX4 Development
    • Companion Computer
    • Drone Hardware
    • Propeller Lock
    • Debug
  • Favorites
    • Bookmarks
Powered by GitBook
On this page
  • Descent Methods
  • Newton's Method
  • Newton's Method in Optimization
  • Gauss-Newton Algorithm
  • Levenberg–Marquardt Algorithm

Was this helpful?

  1. Math
  2. Optimization

Descent Methods

Descent Methods

Consider the unconstrained minimization problem. The general iteration formula in Descent Methods is xk+1=xk+tkΔxk x_{k+1} = x_k + t_k \Delta x_kxk+1​=xk​+tk​Δxk​, where tkt_ktk​is the step size and Δxk\Delta x_kΔxk​is the step direction at iterationkkk. The outline of a general Descent Method is as follows.

given a starting point xxx. repeat 1. Determine a descent direction Δx\Delta xΔx. 2. Line search. Choose a step size t>0t>0t>0. 3. Update. x:=x+tΔxx:=x+t \Delta xx:=x+tΔx. until stopping criterion is satisfied.

Different algorithms differ with respect to the choice of

  • Descent Direction:Δxk\Delta x_kΔxk​

    • Gradient Method

    • Steepest Descent Method

    • Newton Method

  • Step Size: tkt_ktk​

    • Fixed Step Size

    • Exact Line Search

    • Backtracking Line Search

To choose a Descent Direction, a necessary condition is∇f(x)TΔx<0\nabla f(x)^T \Delta x < 0∇f(x)TΔx<0.

  • Steepest Descent Method Δx=−P−1∇f(x)\Delta x = - P^{-1} \nabla f(x)Δx=−P−1∇f(x)

    • by choosing different norm P we can have different directions.

  • Gradient Method Δx=−∇f(x)\Delta x = - \nabla f(x)Δx=−∇f(x)

    • is a special case of the steepest descent method whenPPPis identity matrix.

    • is the steepest in the first order approximation.

  • Newton Method Δx=−∇2f(x)−1∇f(x)\Delta x = - \nabla^2 f(x)^{-1} \nabla f(x)Δx=−∇2f(x)−1∇f(x)

    • is a special case of the steepest descent method whenP=∇2f(x)P = \nabla^2 f(x)P=∇2f(x)

    • is the steepest in the second order approximation.

References: EE230 Convex Optimization Lecture Slide Topic 6: Unconstrained Optimization; Convex Optimization notes provided by Dr. S. Boyd.

Newton's Method

In numerical analysis, Newton's method is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. In each step, the update equation is xn+1=xn−f(xn)f′(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}xn+1​=xn​−f′(xn​)f(xn​)​.

Limitations: It may not work if there are points of inflection, local maxima or minima around the initial guessx0x_0x0​or the root. For example, suppose you need to find the root of 27x3−3x+1=027x^3 - 3x + 1 = 027x3−3x+1=0 which is nearx=0x = 0x=0. Then Newton's method will take you back and forth, or farther away from the root.

Application in Inverse Kinematics:

  • Find θd\theta_dθd​ such that xd−f(θd)=0x_d - f(\theta_d) = 0xd​−f(θd​)=0where xdx_dxd​is the desired point in taskspace that we would like to reach, andf(θ)f(\theta)f(θ)is the forward kinematics model of the manipulator.

  • In each iteration, θi+1=θi+J−1(θi)(xd−f(θi))\theta^{i+1} = \theta^i + J^{-1}(\theta^i)(x_d - f(\theta^i))θi+1=θi+J−1(θi)(xd​−f(θi)) where J−1(θi)J^{-1}(\theta^i)J−1(θi)is the inverse of the Jacobian matrix. For non-square Jacobian matrices, use pseudoinverse instead.

Newton's Method in Optimization

In optimization, Newton's method is applied to the derivativef′f'f′of a twice-differentiable functionfffto find the roots of the derivative (solutions tof′(x)=0f'(x) = 0f′(x)=0), also known as the stationary points offff. These solutions may be minima, maxima, or saddle points.

Putting everything together, Newton's method performs the iteration

xk+1=xk−f′(xk)f′′(xk)=xk−[f′′(xk)]−1f′(xk)=xk−[∇2f(x)]−1∇f(x),x_{k+1} = x_{k} - \frac{f'(x_k)}{f''(x_k)} = x_{k}-[f''(x_{k})]^{-1}f'(x_{k}) = x_k - [\nabla ^{2}f(x)]^{-1}\nabla f(x),xk+1​=xk​−f′′(xk​)f′(xk​)​=xk​−[f′′(xk​)]−1f′(xk​)=xk​−[∇2f(x)]−1∇f(x),

where∇f(x)\nabla f(x)∇f(x)is the gradient and ∇2f(x)=Hf(x)\nabla ^{2}f(x) = H_f(x)∇2f(x)=Hf​(x)is the Hessian matrix.

Often Newton's method is modified to include a small step size0<γ≤10<\gamma \leq 10<γ≤1instead of γ=1\gamma=1γ=1:xk+1=xk−γ[f′′(xk)]−1f′(xk)x_{k+1}=x_{k}-\gamma [f''(x_{k})]^{-1}f'(x_{k})xk+1​=xk​−γ[f′′(xk​)]−1f′(xk​). For step sizes other than 1, the method is often referred to as the relaxed or damped Newton's method.

Gauss-Newton Algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems. It is a modification of Newton's method for finding a minimum of a function.

Unlike Newton's method, the Gauss–Newton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required.

Starting with an initial guess β(0)\boldsymbol{\beta}^{(0)}β(0)for the minimum, the method proceeds by the iterations

β(k+1)=β(k)−(JrTJr)−1JrTr(β(k)) \boldsymbol{\beta}^{(k+1)}= \boldsymbol{\beta}^{(k)}-\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {r} \left(\boldsymbol{\beta}^{(k)}\right)β(k+1)=β(k)−(Jr​TJr​)−1Jr​Tr(β(k))

Note that(JrTJr)−1JrT\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}(Jr​TJr​)−1Jr​Tis the left pseudoinverse of Jr\mathbf {J_{r}}Jr​, and assumption m≥nm \ge nm≥n in the algorithm statement is necessary, as otherwise the matrix JrTJr\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} Jr​TJr​ is not invertible and the normal equations cannot be solved uniquely.

Levenberg–Marquardt Algorithm

Levenberg–Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.

With the Gauss–Newton method, the sum of squares of the residualsSSSmay not decrease at every iteration. However, sinceΔ\DeltaΔis a descent direction, unlessS(βk)S (\boldsymbol {\beta }^{k})S(βk)is a stationary point, it holds that S(βk+αΔ)<S(βk) S (\boldsymbol{\beta}^{k}+\alpha \Delta ) < S (\boldsymbol{\beta}^{k})S(βk+αΔ)<S(βk) for all sufficiently smallα>0 \alpha >0α>0. Thus, if divergence occurs, one solution is to employ a fractionα \alpha αof the increment vectorΔ\DeltaΔin the updating formula: βk+1=βk+αΔ{\boldsymbol {\beta }}^{k+1}={\boldsymbol {\beta }}^{k}+\alpha \Deltaβk+1=βk+αΔ. In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective functionSSS. An optimal value forα \alphaαcan be found by using a line search algorithm.

In cases where the direction of the shift vector is such that the optimal fractionα\alphaαis close to zero, an alternative method for handling divergence is the use of the Levenberg–Marquardt algorithm, a trust region method. The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,

(JTJ+λD)Δ=−JTr\left(\mathbf {J^{\mathsf {T}}J+\lambda D} \right)\Delta =-\mathbf {J} ^{\mathsf {T}}\mathbf {r}(JTJ+λD)Δ=−JTr

whereDDDis a positive diagonal matrix andλ\lambdaλis a parameter that controls the trust-region size. Geometrically, this adds a paraboloid centered atΔx=0\Delta x=0Δx=0 to the quadratic form, resulting in a smaller step. Note that whenDDDis the identity matrix and λ→+∞\lambda \to +\inftyλ→+∞, then λΔ→−JTr\lambda \Delta \to -\mathbf {J} ^{\mathsf {T}}\mathbf {r}λΔ→−JTr, therefore the direction ofΔ\DeltaΔapproaches the direction of the negative gradient −JTr-\mathbf {J} ^{\mathsf {T}}\mathbf {r}−JTr.

Conclusions:

  • The main difference between GN and LM methods is the selection of matrix HHH. One is H=JTJH=\mathbf{J}^{\mathsf{T}} \mathbf{J}H=JTJ and the other is H=JTJ+λDH=\mathbf{J}^{\mathsf{T}} \mathbf{J} +\lambda DH=JTJ+λD.

  • When the function can be well approximated by second order expansion, we use the Gauss-Newton method.

  • When the approximation to the problem is not good, where the matrix JTJ\mathbf{J}^{\mathsf{T}} \mathbf{J}JTJmay not be invertible or under ill condition, we use the Levenberg–Marquardt method.

PreviousConvex OptimizationNextProbability

Last updated 2 years ago

Was this helpful?

References: ; ; Chapter 6 in Modern Robotics textbook

Reference:

Reference:

References: ;

https://en.wikipedia.org/wiki/Newton%27s_method
https://brilliant.org/wiki/newton-raphson-method/
https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization
https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
function 27x^3 - 3x + 1 = 0