
Optimization method Trust-Region DOGLEG. Python implementation example

Trust-region method (TRM) is one of the most important numerical optimization methods in solving nonlinear programming problems. The method is based on determining the region around the best solution in which the quadratic model approximates the objective function.
Line search methods and trust-region methods generate steps by approximating the objective function with a quadratic model, but they use this model in different ways. Linear search uses it to obtain the direction of the search and further find the optimal step along the direction. The Trust-region method defines the region (region) around the current iteration, in which the model approximates the objective function sufficiently. In order to increase efficiency, the direction and step length are selected simultaneously.
Trust-region methods are reliable and stable, can be applied to poorly conditioned tasks and have very good convergence properties. Good convergence is due to the fact that the size of the TR region (usually determined by the radius vector module) at each iteration depends on the improvements made at previous iterations.
If the calculations show a good approximation of the approximating model to the objective function, then the trust-region can be increased. Otherwise, if the approximating model does not work well, the trust-region should be reduced.
As a result, we get approximately the following picture:

Algorithm
Step # 1
The trust-region method uses a quadratic model. At each iteration, the step is calculated by solving the following quadratic problem ( subproblem ):
Where
Thus, trust-region requires sequential calculation of approximations of a quadratic model in which the objective function and condition (which can be written
There are several strategies for approximating the quadratic model, including the following:
Cauchy point algorithm
The concept of the method is similar to the logic of the steepest descent algorithm. The Cauchy point lies on a gradient that minimizes the quadratic model, provided that there is a step in the trust-region. By sequentially finding Cauchy points, a local minimum can be found. The method has inefficient convergence like the steepest descent method.
Steihaug Algorithm
The method is called its researcher - Steihaug. It is a modified method of conjugate gradients (conjugate gradient approach).
Dogleg Algorithm
The article will detail the method for approximating the quadratic dogleg model , which is one of the most common and effective methods. Used if the Hessian matrix (or its approximation) is positive definite.
The simplest and most interesting version of the method works with a polygon consisting of only two segments, which to some people resemble a dog’s leg.
Step number 2
The first problem that arises when determining the trust-region of an algorithm is the choice of a strategy for finding the optimal trust-region radius
The denominator should always be non-negative, as the step
If
If
In other cases, the trust-region remains unchanged.
Step number 3
The following algorithm describes the process:
Determine the starting point
for k = 0, 1, 2, ... for now
We decide:
Where
We calculate the ratio:
Update the current point:
Update trust-region radius:
Expanded algorithm

Note that the radius increases only if
Dogleg Algorithm
The method begins by checking the effectiveness of the trust-region radius in the solution.
When
When
For averages

Dogleg method approximates a curved path
The second begins with
Formally, we denote the trajectory
For searching
We find the discriminant of the equation:
The root of the equation is:
Dogleg method selects
Example
Using the trust-region (dogleg) algorithm, optimize the following function (Rosenbrock function):
Find the gradient and the Hessian functions:
We initialize the necessary variables for the algorithm to work:
Iteration 1
Find the optimal solution to the quadratic model
Since
Consequently:
We calculate
actual reduction =
predicted reduction =
Update
Iteration 2
Find the optimal solution to the quadratic model
Since
Consequently:
We calculate
actual reduction =
predicted reduction =
Because
Update
Iteration 3
Find the optimal solution to the quadratic model
Where
We calculate
actual reduction =
predicted reduction =
Update
The algorithm continues until
The table of the results of the algorithm for the Rosenbrock function:
k | ||||
---|---|---|---|---|
0 | - | - | 1.0 | [5, 5] |
1 | [-0.9950, 0.0994] | 1.072 | 1.0 | [4.0049, 5.0994] |
2 | [-0.9923, 0.1238] | 1.1659 | 2.0 | [3.0126, 5.2233] |
3 | [-0.2575, 1.9833] | 1.0326 | 2.0 | [2.7551, 7.2066] |
4 | [-0.0225, 0.2597] | 1.0026 | 2.0 | [2.7325, 7.4663] |
5 | [-0.3605, -1.9672] | -0.4587 | 0.5 | [2.7325, 7.4663] |
6 | [-0.0906, -0.4917] | 0.9966 | 1.0 | [2.6419, 6.9746] |
7 | [-0.1873, -0.9822] | 0.8715 | 2.0 | [2.4546, 5.9923] |
8 | [-0.1925, -0.9126] | 1.2722 | 2.0 | [2.2620, 5.0796] |
9 | [-0.1499, -0.6411] | 1.3556 | 2.0 | [2.1121, 4.4385] |
10 | [-0.2023, -0.8323] | 1.0594 | 2.0 | [1.9097, 3.6061] |
eleven | [-0.0989, -0.3370] | 1.2740 | 2.0 | [1.8107, 3.2690] |
12 | [-0.2739, -0.9823] | -0.7963 | 0.25495 | [1.8107, 3.2690] |
thirteen | [-0.0707, -0.2449] | 1.0811 | 0.5099 | [1.7399, 3.0240] |
14 | [-0.1421, -0.4897] | 0.8795 | 1.0198 | [1.5978, 2.5343] |
fifteen | [-0.1254, -0.3821] | 1.3122 | 1.0198 | [1.4724, 2.1522] |
16 | [-0.1138, -0.3196] | 1.3055 | 1.0198 | [1.3585, 1.8326] |
17 | [-0.0997, -0.2580] | 1.3025 | 1.0198 | [1.2587, 1.5745] |
18 | [-0.0865, -0.2079] | 1.2878 | 1.0198 | [1.1722, 1.3666] |
19 | [-0.0689, -0.1541] | 1.2780 | 1.0198 | [1.1032, 1.2124] |
20 | [-0.0529, -0.1120] | 1.2432 | 1.0198 | [1.0503, 1.1004] |
21 | [-0.0322, -0.0649] | 1.1971 | 1.0198 | [1.0180, 1.0354] |
22 | [-0.0149, -0.0294] | 1.1097 | 1.0198 | [1.0031, 1.0060] |
23 | [-0.0001, -0.0002] | 1.0012 | 1.0198 | [1.00000024, 1.00000046] |
24 | [-2.37065e-07, -4.56344e-07] | 1.0000 | 1.0198 | [1.0, 1.0] |
Analytically find the minimum of the Rosenbrock function, it is reached at the point
Python implementation example
The algorithm is implemented using the numpy library. In the example, a limit on the number of iterations is imposed.
'''
Pure Python/Numpy implementation of the Trust-Region Dogleg algorithm.
Reference: https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
'''
#!/usr/bin/python
# -*- coding: utf-8 -*-
import numpy as np
import numpy.linalg as ln
import scipy as sp
from math import sqrt
# Objective function
def f(x):
return (1-x[0])**2 + 100*(x[1]-x[0]**2)**2
# Gradient
def jac(x):
return np.array([-400*(x[1] - x[0]**2)*x[0] - 2 + 2*x[0], 200*x[1] - 200*x[0]**2])
# Hessian
def hess(x):
return np.array([[1200*x[0]**2 - 400*x[1]+2, -400*x[0]], [-400*x[0], 200]])
def dogleg_method(Hk, gk, Bk, trust_radius):
# Compute the Newton point.
# This is the optimum for the quadratic model function.
# If it is inside the trust radius then return this point.
pB = -np.dot(Hk, gk)
norm_pB = sqrt(np.dot(pB, pB))
# Test if the full step is within the trust region.
if norm_pB <= trust_radius:
return pB
# Compute the Cauchy point.
# This is the predicted optimum along the direction of steepest descent.
pU = - (np.dot(gk, gk) / np.dot(gk, np.dot(Bk, gk))) * gk
dot_pU = np.dot(pU, pU)
norm_pU = sqrt(dot_pU)
# If the Cauchy point is outside the trust region,
# then return the point where the path intersects the boundary.
if norm_pU >= trust_radius:
return trust_radius * pU / norm_pU
# Find the solution to the scalar quadratic equation.
# Compute the intersection of the trust region boundary
# and the line segment connecting the Cauchy and Newton points.
# This requires solving a quadratic equation.
# ||p_u + tau*(p_b - p_u)||**2 == trust_radius**2
# Solve this for positive time t using the quadratic formula.
pB_pU = pB - pU
dot_pB_pU = np.dot(pB_pU, pB_pU)
dot_pU_pB_pU = np.dot(pU, pB_pU)
fact = dot_pU_pB_pU**2 - dot_pB_pU * (dot_pU - trust_radius**2)
tau = (-dot_pU_pB_pU + sqrt(fact)) / dot_pB_pU
# Decide on which part of the trajectory to take.
return pU + tau * pB_pU
def trust_region_dogleg(func, jac, hess, x0, initial_trust_radius=1.0,
max_trust_radius=100.0, eta=0.15, gtol=1e-4,
maxiter=100):
xk = x0
trust_radius = initial_trust_radius
k = 0
while True:
gk = jac(xk)
Bk = hess(xk)
Hk = np.linalg.inv(Bk)
pk = dogleg_method(Hk, gk, Bk, trust_radius)
# Actual reduction.
act_red = func(xk) - func(xk + pk)
# Predicted reduction.
pred_red = -(np.dot(gk, pk) + 0.5 * np.dot(pk, np.dot(Bk, pk)))
# Rho.
rhok = act_red / pred_red
if pred_red == 0.0:
rhok = 1e99
else:
rhok = act_red / pred_red
# Calculate the Euclidean norm of pk.
norm_pk = sqrt(np.dot(pk, pk))
# Rho is close to zero or negative, therefore the trust region is shrunk.
if rhok < 0.25:
trust_radius = 0.25 * norm_pk
else:
# Rho is close to one and pk has reached the boundary of the trust region, therefore the trust region is expanded.
if rhok > 0.75 and norm_pk == trust_radius:
trust_radius = min(2.0*trust_radius, max_trust_radius)
else:
trust_radius = trust_radius
# Choose the position for the next iteration.
if rhok > eta:
xk = xk + pk
else:
xk = xk
# Check if the gradient is small enough to stop
if ln.norm(gk) < gtol:
break
# Check if we have looked at enough iterations
if k >= maxiter:
break
k = k + 1
return xk
result = trust_region_dogleg(f, jac, hess, [5, 5])
print("Result of trust region dogleg method: {}".format(result))
print("Value of function at a point: {}".format(f(result)))
Thank you for your interest in my article. I hope she was useful to you and you learned a lot.