FUNNYDMAN August 11, 2017 at 19:03

Optimization method Trust-Region DOGLEG. Python implementation example

Trust-region method (TRM) is one of the most important numerical optimization methods in solving nonlinear programming problems. The method is based on determining the region around the best solution in which the quadratic model approximates the objective function.

Line search methods and trust-region methods generate steps by approximating the objective function with a quadratic model, but they use this model in different ways. Linear search uses it to obtain the direction of the search and further find the optimal step along the direction. The Trust-region method defines the region (region) around the current iteration, in which the model approximates the objective function sufficiently. In order to increase efficiency, the direction and step length are selected simultaneously.

Trust-region methods are reliable and stable, can be applied to poorly conditioned tasks and have very good convergence properties. Good convergence is due to the fact that the size of the TR region (usually determined by the radius vector module) at each iteration depends on the improvements made at previous iterations.

If the calculations show a good approximation of the approximating model to the objective function, then the trust-region can be increased. Otherwise, if the approximating model does not work well, the trust-region should be reduced.

As a result, we get approximately the following picture:

Algorithm

Step # 1

The trust-region method uses a quadratic model. At each iteration, the step is calculated by solving the following quadratic problem ( subproblem ):

$\ min_ {p \ in R ^ n} \ m_k (p) = f_k + p ^ T g_k + \ frac 1 2 p ^ T B_kp, \ \\ st | p | \ leq \ Delta_k,$

Where

$f_k = f (x_k), g_k = \ nabla f (x_k), B_k = \ nabla ^ 2f (x_k)$ ;

$\ Delta_k> 0$ - trust-region radius;

Thus, trust-region requires sequential calculation of approximations of a quadratic model in which the objective function and condition (which can be written

$p ^ Tp \ le \ Delta_k ^ 2$ ) is also quadratic. When the Hessian (

$inline$ ) is positive definite and

$| B_k ^ {- 1} \ nabla f_k | \ le \ Delta_k$ the decision is easy to determine - this is an unconditional minimum

$p_k ^ B = -B_k ^ {- 1} \ nabla f_k$ . In this case

$inline$ called the full step. The solution is not so obvious in other cases, but finding it does not cost too much. We need to find only an approximate solution in order to obtain sufficient convergence and good algorithm behavior in practice.

There are several strategies for approximating the quadratic model, including the following:

Cauchy point algorithm

The concept of the method is similar to the logic of the steepest descent algorithm. The Cauchy point lies on a gradient that minimizes the quadratic model, provided that there is a step in the trust-region. By sequentially finding Cauchy points, a local minimum can be found. The method has inefficient convergence like the steepest descent method.

Steihaug Algorithm

The method is called its researcher - Steihaug. It is a modified method of conjugate gradients (conjugate gradient approach).

Dogleg Algorithm

The article will detail the method for approximating the quadratic dogleg model , which is one of the most common and effective methods. Used if the Hessian matrix (or its approximation) is positive definite.
The simplest and most interesting version of the method works with a polygon consisting of only two segments, which to some people resemble a dog’s leg.

Step number 2

The first problem that arises when determining the trust-region of an algorithm is the choice of a strategy for finding the optimal trust-region radius

$\ Delta_k$ at each iteration. Selection is based on function similarity

$inline$ and objective function

$inline$ in the previous iteration. After finding

$inline$ we define the following relationship:

$\ rho_k = \ frac {actual \ reduction} {predicted \ reduction} = \ frac {f (x_k) - f (x_k + p_k)} {m_k (0) - m_k (p_k)}$

The denominator should always be non-negative, as the step

$inline$ obtained by minimizing the quadratic model

$inline$ by region, which includes a step

$inline$ . The relation is used to determine the continuity of the step and the subsequent update of the trust-region radius.

If

$\ rho_k <0$ or

$\ rho_k \ approx0$ we are reducing the size of the trust-region area.

If

$\ rho_k \ approx1$ , then the model corresponds well to the objective function. In this case, you should expand the trust-region at the next iteration.

In other cases, the trust-region remains unchanged.

Step number 3

The following algorithm describes the process:

Determine the starting point

$inline$ maximum trust-region radius

$\ stackrel {-} {\ Delta}$ , initial trust-region radius

$\ Delta_0 = \ in (0, \ stackrel {-} {\ Delta})$ and constant

$\ eta \ in [0, \ frac {1} {4})$

for k = 0, 1, 2, ... for now

$inline$ not optimum.

We decide:

$\ min_ {p \ in R ^ n} \ m_k (p) = f_k + p ^ T g_k + \ frac 1 2 p ^ T B_kp \ \\ st | p | \ leq \ Delta_k$

Where

$inline$ -decision.
We calculate the ratio:

$\ rho_k = \ frac {f (x_k) - f (x_k + p_k)} {m_k (0) - m_k (p_k)}$

Update the current point:

$$$ display $$ x_ {k + 1} = \ begin {equation *} \ begin {cases} x_k + p_k & if \ \ rho_k> \ eta, \\ x_k & \ text {otherwise.} \ end {cases} \ end {equation *} $$ display $$$

Update trust-region radius:

$$$ display $$ \ Delta_ {k + 1} = \ begin {equation *} \ begin {cases} \ frac {1} {4} \ Delta_k & if \ \ rho_k <\ frac {1} {4}, \ \ min (2 \ Delta_k, \ stackrel {-} {\ Delta}) & if \ \ rho_k> \ frac {3} {4} \ and \ | p_k | = \ Delta_k, \\ \ Delta_k & otherwise. \ end {cases} \ end {equation *} $$ display $$$

Expanded algorithm

Note that the radius increases only if

$inline$ reaches the trust-region border. If the step remains strictly in the region, then the current value does not affect the operation of the algorithm and there is no need to change the radius value at the next iteration.

Dogleg Algorithm

The method begins by checking the effectiveness of the trust-region radius in the solution.

$p ^ * (\ Delta)$ quadratic model

$inline$ . When

$inline$ positive defined, as already noted, the full solution is the optimal solution

$p ^ B = -B ^ {- 1} g$ . When this point can be found, obviously, it will be a solution.

$p ^ * (\ Delta) = p ^ B, \ \ Delta \ geq | p ^ B |.$

When

$\ Delta_k$ very small limitation

$| p | \ leq \ Delta_k$ ensures that the square term in the model

$inline$ has a slight effect on the decision. Real solution

$p (\ Delta)$ approximates also as if we optimized a linear function

$inline$ on condition

$| p | \ leq \ Delta$ then:

$p ^ * (\ Delta) \ approx - \ Delta \ frac {g} {| g |}$

When

$\ Delta$ quite a bit.

For averages

$\ Delta$ decision

$p ^ * (\ Delta)$ usually follows a curved path, as shown in the picture: the

Dogleg method approximates a curved path

$p ^ * (\ Delta)$ line consisting of two straight lines. The first segment starts from the beginning and extends along the steepest descent direction and is defined as follows:

$p ^ U = - \ frac {g ^ Tg} {g ^ TBg} g$

The second begins with

$inline$ and continues to

$inline$ .

Formally, we denote the trajectory

$\ tilde p (\ tau)$ where

$\ tau \ in [0, 2]$ ;

$$$ display $$ \ tilde p (\ tau) = \ begin {equation *} \ begin {cases} \ tau p ^ U, & 0 \ leq \ tau \ leq1, \\\ p ^ U + (\ tau - 1 ) (p ^ Bp ^ U), & 1 \ leq \ tau \ leq2. \ end {cases} \ end {equation *} $$ display $$$

For searching

$\ tau$ it is necessary to solve a quadratic equation of the following form:

$| p ^ U + \ tau * (p ^ B - p ^ U) | ^ 2 = \ Delta ^ 2$

$(p ^ U) ^ 2 + 2 \ tau (p ^ Bp ^ U) p ^ U + \ tau ^ 2 (p ^ Bp ^ U) ^ 2 = \ Delta ^ 2$

We find the discriminant of the equation:

$D = 4 (p ^ Bp ^ U) ^ 2 (p ^ U) ^ 2 - 4 (p ^ Bp ^ U) ^ 2 ((p ^ U) ^ 2 - \ Delta ^ 2)$

$\ sqrt {D} = 2 (p ^ Bp ^ U) \ Delta$

The root of the equation is:

$\ tau = \ frac {-2 (p ^ Bp ^ U) p ^ U + 2 (p ^ B - p ^ U) \ Delta} {2 (p ^ Bp ^ U) ^ 2} = \ frac {\ Delta-p ^ U} {p ^ Bp ^ U}$

Dogleg method selects

$inline$ to minimize the model

$inline$ along this way. In fact, there is no need to search because the dogleg path crosses the trust-region border only once and the intersection point can be found analytically.

Example

Using the trust-region (dogleg) algorithm, optimize the following function (Rosenbrock function):

$display$

Find the gradient and the Hessian functions:

$\ frac {\ partial f} {\ partial x} = -400 (yx ^ 2) x - 2 + 2x$

$\ frac {\ partial f} {\ partial y} = 200y - 200x ^ 2$

$\ frac {\ partial ^ 2 f} {\ partial x \ partial x} = 1200x ^ 2 - 400y + 2$

$\ frac {\ partial ^ 2 f} {\ partial x \ partial y} = \ frac {\ partial ^ 2 f} {\ partial y \ partial x} = -400x$

$\ frac {\ partial ^ 2 f} {\ partial y \ partial y} = 200$

We initialize the necessary variables for the algorithm to work:

$\ Delta_k$ = 1.0,

$\ stackrel {-} {\ Delta}$ = 100.0,

$inline$ ,

$inline$ (required accuracy)

$\ eta = 0.15$ .

Iteration 1

Find the optimal solution to the quadratic model

$inline$ :

Since

$inline$ and

$|p^U| > \Delta_k$

Consequently:

$p_k = \frac{\Delta_kp^U}{|p^U|} = [-1.4226, 0.1422]$

We calculate

$\rho_k$ :

actual reduction =

$inline$

predicted reduction =

$inline$

$\rho_k = \frac{28038.11}{26146.06} = 1.0723$

$\Delta_k$ - remains unchanged.

Update

$inline$ :

$display$

Iteration 2

Find the optimal solution to the quadratic model

$inline$ :

Since

$inline$ and

$|p^U| > \Delta_k$ .

Consequently:

$p_k = \frac{\Delta_kp^U}{|p^U|} = [-0.9923 0.1238]$

We calculate

$\rho_k$ :

actual reduction =

$inline$

predicted reduction =

$inline$

$\rho_k = \frac{10489.43}{8996.73} = 1.1659$

Because

$\rho_k > 0.75$ and

$|p_k|=\Delta_k$ :

$\Delta_k = min(2\Delta_k, \stackrel{-}{\Delta_k}) = 2.0$

Update

$inline$ :

$display$

Iteration 3

Find the optimal solution to the quadratic model

$inline$ :

$p_k = p^U + \tau * (p^B-p^U) = [-0.257, 1.983]$

Where

$\tau = 0.5058$ .

We calculate

$\rho_k$ :

actual reduction =

$inline$

predicted reduction =

$inline$

$\rho_k = \frac{1470.62}{1424.16} = 1.0326$

$\Delta_k$ - remains the same.

Update

$inline$ :

$display$

The algorithm continues until

$inline$ gtol or not given number of iterations.

The table of the results of the algorithm for the Rosenbrock function:

k	$inline$	$\rho_k$	$\Delta_k$	$inline$
0	-	-	1.0	[5, 5]
1	[-0.9950, 0.0994]	1.072	1.0	[4.0049, 5.0994]
2	[-0.9923, 0.1238]	1.1659	2.0	[3.0126, 5.2233]
3	[-0.2575, 1.9833]	1.0326	2.0	[2.7551, 7.2066]
4	[-0.0225, 0.2597]	1.0026	2.0	[2.7325, 7.4663]
5	[-0.3605, -1.9672]	-0.4587	0.5	[2.7325, 7.4663]
6	[-0.0906, -0.4917]	0.9966	1.0	[2.6419, 6.9746]
7	[-0.1873, -0.9822]	0.8715	2.0	[2.4546, 5.9923]
8	[-0.1925, -0.9126]	1.2722	2.0	[2.2620, 5.0796]
9	[-0.1499, -0.6411]	1.3556	2.0	[2.1121, 4.4385]
10	[-0.2023, -0.8323]	1.0594	2.0	[1.9097, 3.6061]
eleven	[-0.0989, -0.3370]	1.2740	2.0	[1.8107, 3.2690]
12	[-0.2739, -0.9823]	-0.7963	0.25495	[1.8107, 3.2690]
thirteen	[-0.0707, -0.2449]	1.0811	0.5099	[1.7399, 3.0240]
14	[-0.1421, -0.4897]	0.8795	1.0198	[1.5978, 2.5343]
fifteen	[-0.1254, -0.3821]	1.3122	1.0198	[1.4724, 2.1522]
16	[-0.1138, -0.3196]	1.3055	1.0198	[1.3585, 1.8326]
17	[-0.0997, -0.2580]	1.3025	1.0198	[1.2587, 1.5745]
18	[-0.0865, -0.2079]	1.2878	1.0198	[1.1722, 1.3666]
19	[-0.0689, -0.1541]	1.2780	1.0198	[1.1032, 1.2124]
20	[-0.0529, -0.1120]	1.2432	1.0198	[1.0503, 1.1004]
21	[-0.0322, -0.0649]	1.1971	1.0198	[1.0180, 1.0354]
22	[-0.0149, -0.0294]	1.1097	1.0198	[1.0031, 1.0060]
23	[-0.0001, -0.0002]	1.0012	1.0198	[1.00000024, 1.00000046]
24	[-2.37065e-07, -4.56344e-07]	1.0000	1.0198	[1.0, 1.0]

Analytically find the minimum of the Rosenbrock function, it is reached at the point

$inline$ . Thus, you can verify the effectiveness of the algorithm.

Python implementation example

The algorithm is implemented using the numpy library. In the example, a limit on the number of iterations is imposed.

'''
    Pure Python/Numpy implementation of the Trust-Region Dogleg algorithm.
    Reference: https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
'''
#!/usr/bin/python
# -*- coding: utf-8 -*-
import numpy as np
import numpy.linalg as ln
import scipy as sp
from math import sqrt
# Objective function    
def f(x):
    return (1-x[0])**2 + 100*(x[1]-x[0]**2)**2
# Gradient
def jac(x):
    return np.array([-400*(x[1] - x[0]**2)*x[0] - 2 + 2*x[0], 200*x[1] - 200*x[0]**2])
# Hessian
def hess(x):
    return np.array([[1200*x[0]**2 - 400*x[1]+2, -400*x[0]], [-400*x[0], 200]])
def dogleg_method(Hk, gk, Bk, trust_radius):
    # Compute the Newton point.
    # This is the optimum for the quadratic model function.
    # If it is inside the trust radius then return this point.
    pB = -np.dot(Hk, gk)
    norm_pB = sqrt(np.dot(pB, pB))
    # Test if the full step is within the trust region.
    if norm_pB <= trust_radius:
        return pB
    # Compute the Cauchy point.
    # This is the predicted optimum along the direction of steepest descent.
    pU = - (np.dot(gk, gk) / np.dot(gk, np.dot(Bk, gk))) * gk
    dot_pU = np.dot(pU, pU)
    norm_pU = sqrt(dot_pU)
    # If the Cauchy point is outside the trust region,
    # then return the point where the path intersects the boundary.
    if norm_pU >= trust_radius:
        return trust_radius * pU / norm_pU
    # Find the solution to the scalar quadratic equation.
    # Compute the intersection of the trust region boundary
    # and the line segment connecting the Cauchy and Newton points.
    # This requires solving a quadratic equation.
    # ||p_u + tau*(p_b - p_u)||**2 == trust_radius**2
    # Solve this for positive time t using the quadratic formula.
    pB_pU = pB - pU
    dot_pB_pU = np.dot(pB_pU, pB_pU)
    dot_pU_pB_pU = np.dot(pU, pB_pU)
    fact = dot_pU_pB_pU**2 - dot_pB_pU * (dot_pU - trust_radius**2)
    tau = (-dot_pU_pB_pU + sqrt(fact)) / dot_pB_pU
    # Decide on which part of the trajectory to take.
    return pU + tau * pB_pU
def trust_region_dogleg(func, jac, hess, x0, initial_trust_radius=1.0,
                        max_trust_radius=100.0, eta=0.15, gtol=1e-4, 
                        maxiter=100):
    xk = x0
    trust_radius = initial_trust_radius
    k = 0
    while True:
        gk = jac(xk)
        Bk = hess(xk)
        Hk = np.linalg.inv(Bk)
        pk = dogleg_method(Hk, gk, Bk, trust_radius)
        # Actual reduction.
        act_red = func(xk) - func(xk + pk)
        # Predicted reduction.
        pred_red = -(np.dot(gk, pk) + 0.5 * np.dot(pk, np.dot(Bk, pk)))
        # Rho.
        rhok = act_red / pred_red
        if pred_red == 0.0:
            rhok = 1e99
        else:
            rhok = act_red / pred_red
        # Calculate the Euclidean norm of pk.
        norm_pk = sqrt(np.dot(pk, pk))
        # Rho is close to zero or negative, therefore the trust region is shrunk.
        if rhok < 0.25:
            trust_radius = 0.25 * norm_pk
        else: 
        # Rho is close to one and pk has reached the boundary of the trust region, therefore the trust region is expanded.
            if rhok > 0.75 and norm_pk == trust_radius:
                trust_radius = min(2.0*trust_radius, max_trust_radius)
            else:
                trust_radius = trust_radius
        # Choose the position for the next iteration.
        if rhok > eta:
            xk = xk + pk
        else:
            xk = xk
        # Check if the gradient is small enough to stop
        if ln.norm(gk) < gtol:
            break
        # Check if we have looked at enough iterations
        if k >= maxiter:
            break
        k = k + 1
    return xk
result = trust_region_dogleg(f, jac, hess, [5, 5])
print("Result of trust region dogleg method: {}".format(result))
print("Value of function at a point: {}".format(f(result)))

Thank you for your interest in my article. I hope she was useful to you and you learned a lot.

Tags:

Optimization method Trust-Region DOGLEG. Python implementation example

Algorithm

Step # 1

Step number 2

Step number 3

Expanded algorithm

Dogleg Algorithm

Example

The table of the results of the algorithm for the Rosenbrock function:

Python implementation example

Also popular now: