Julia and the method of coordinate descent

Tutorial

The method of coordinate descent is one of the simplest methods of multidimensional optimization and copes well with finding a local minimum of functions with a relatively smooth relief, so familiarization with optimization methods is better to start with it.

The extremum search is conducted in the direction of the axes of coordinates, i.e. in the search process, only one coordinate is changed. Thus, the multidimensional problem is reduced to one-dimensional.

Algorithm

First cycle:

$inline$ , $inline$ , $inline$ , ..., $inline$ .
We are looking for the extremum of the function $inline$ . Let the extremum of the function be at $inline$ .
$inline$ , $inline$ , $inline$ , ..., $inline$ . Extremum function $inline$ equals $inline$
...
$inline$ , $inline$ , $inline$ , ..., $inline$ .
As a result of performing n steps, a new point of approach to the extremum was found. $inline$ . Next, we check the end-of-bill criterion: if it is executed, a solution is found, otherwise we perform another cycle.

Training

Check package versions:

(v1.1) pkg> status
    Status `C:\Users\User\.julia\environments\v1.1\Project.toml`
  [336ed68f] CSV v0.4.3
  [a93c6f00] DataFrames v0.17.1
  [7073ff75] IJulia v1.16.0
  [47be7bcc] ORCA v0.2.1
  [58dd65bb] Plotly v0.2.0
  [f0f68f2c] PlotlyJS v0.12.3
  [91a5bcdd] Plots v0.23.0
  [ce6b1742] RDatasets v0.6.1
  [90137ffa] StaticArrays v0.10.2
  [8bb1440f] DelimitedFiles
  [10745b16] Statistics

Let us set a function for drawing the surface or level lines, in which it would be convenient to adjust the boundaries of the graph:

using Plots
plotly() # интерактивные графики
function plotter(plot_fun; low, up)
    Xs = range(low[1], stop = up[1], length = 80)
    Ys = range(low[2], stop = up[2], length = 80)
    Zs = [ fun([x y]) for x in Xs, y in Ys ];
    plot_fun(Xs, Ys, Zs)
    xaxis!( (low[1], up[1]), low[1]:(up[1]-low[1])/5:up[1] ) # линовка осей
    yaxis!( (low[2], up[2]), low[2]:(up[2]-low[2])/5:up[2] )
end

As a model function, choose an elliptic paraboloid.

parabol(x) = sum(u->u*u, x)
fun = parabol
plotter(surface, low = [-1-1], up = [11])

Coordinate descent

The method is implemented in the host function of the name of the one-dimensional optimization method, the dimension of the problem, the desired error, the initial approximation, and limitations for drawing level lines. We will set default values for all parameters.

function ofDescent(odm; ndimes = 2, ε = 1e-4, fit = [.5.5], low = [-1-1], up = [11])
    k = 1# счетчик для ограничения количества шагов
    sumx = 0
    sumz = 1
    plotter(contour, low = low, up = up) # рисуем контуры рельефа
    x = [ fit[1] ] # массив из одного элемента
    y = [ fit[2] ] # в них можно пушить координатыwhile abs(sumz - sumx) > ε && k<80
        fitz = copy(fit)
        for i = 1:ndimes
            odm(i, fit, ε) # отрабатывает одномерная оптимизация
        end
        push!(x, fit[1])
        push!(y, fit[2])
        sumx = sum(abs, fit )
        sumz = sum(abs, fitz)
        #println("$k $fit")
        k += 1
    end
    # кружочками отрисовать маршрут спуска
    scatter!(x', y', legend = false, marker=(10, 0.5, :orange) );# размер, непрозрачность, цвет
    p = title!("Age = $(size(x,1)) f(x,y) = $(fun(fit))")
end

Further we will try various methods of one-dimensional optimization.

Newton's method

$x_ {n + 1} = x_n - \ frac {f (x_n)} {f '(x_n)}$

The idea of the method is simple as the implementation

# производная
dfun = (x, i) -> i == 1 ? 2x[1] + x[2]*x[2] : 2x[2] + x[1]*x[1]
function newton2(i, fit, ϵ)
    k = 1
    oldfit = Inf
    while ( abs(fit[i] - oldfit) > ϵ && k<50 )
        oldfit = fit[i]
        tryfit = fun(fit) / dfun(fit, i)
        fit[i] -= tryfit
        println("   $k $fit")
    k+=1
    end
end
ofDescent(newton2, fit = [9.19.1], low = [-4-4], up = [1313])

Newton is quite demanding of the initial approximation, and without restrictions on the steps may well jump into the unknown distance. Calculation of the derivative is desirable, but you can get by with a small variation. Modifying our function:

function newton(i, fit, ϵ)
    k = 1
    oldfit = Inf
    x = []
    y = []
    push!(x, fit[1])
    push!(y, fit[2])
    while ( abs(oldfit - fit[i]) > ϵ && k<50 )
        fx = fun(fit)
        oldfit = fit[i]
        fit[i] += 0.01
        dfx = fun(fit)
        fit[i] -= 0.01
        tryfit = fx*0.01 / (dfx-fx)
        # обрубаем при заскоке значения функции на порядокif( abs(tryfit) > abs(fit[i])*10 )
            push!(x, fit[1])
            push!(y, fit[2])
            break
        end
        fit[i] -= tryfit
        #println("   $k $fit")
        push!(x, fit[1])
        push!(y, fit[2])
        k+=1
    end
    # траекторию Одном-й оптим-ии рисуем квадратиками
    plot!(x, y, w = 3, legend = false, marker = :rect )
end
ofDescent(newton, fit = [9.19.1], low = [-4-4], up = [1313])

Inverse Parabolic Interpolation

The method does not require knowledge of the derivative and has a good convergence.

function ipi(i, fit, ϵ) # inverse parabolic interpolation
    n = 0
    xn2 = copy(fit)
    xn1 = copy(fit)
    xn  = copy(fit)
    xnp = zeros( length(fit) )
    xy  = copy(fit)
    xn2[i] = fit[i] - 0.15
     xn[i] = fit[i] + 0.15
    fn2 = fun(xn2)
    fn1 = fun(xn1)
    while abs(xn[i] - xn1[i]) > ϵ && n<80
        fn = fun(xn)
        a = fn1*fn  / ( (fn2-fn1)*(fn2-fn ) )
        b = fn2*fn  / ( (fn1-fn2)*(fn1-fn ) )
        c = fn2*fn1 / ( (fn -fn2)*(fn -fn1) )
        xnp[i] = a*xn2[i] + b*xn1[i] + c*xn[i]
        xn2[i] = xn1[i]
        xn1[i] =  xn[i]
         xn[i] = xnp[i]
        fn2 = fn1
        fn1 = fn
        n += 1
        println("   $n $xn $xn1")
        xy = [xy; xn]
    end
    fit[i] = xnp[i]
    plot!(xy[:,1], xy[:,2], w = 3, legend = false, marker = :rect )
end
ofDescent(ipi, fit = [0.10.1], low = [-.1-.1], up = [.4.4])

If we take the initial approximation worse, the method will begin to require an unmeasured number of steps for each epoch of coordinate descent. In this regard, his brother wins.

Sequential Parabolic Interpolation

It also requires three starting points , but on many test functions it shows more satisfactory results.

function spi(i, fit, ϵ) # sequential parabolic interpolation
    n = 0
    xn2 = copy(fit)
    xn1 = copy(fit)
    xn  = copy(fit)
    xnp = zeros( length(fit) )
    xy  = copy(fit)
    xn2[i] = fit[i] - 0.01
    xn[i] = fit[i] + 0.01
    fn2 = fun(xn2)
    fn1 = fun(xn1)
    while abs(xn[i] - xn1[i]) > ϵ && n<200
        fn = fun(xn)
        v0 = xn1[i] - xn2[i]
        v1 = xn[i]  - xn2[i]
        v2 = xn[i]  - xn1[i]
        s0 = xn1[i]*xn1[i] - xn2[i]*xn2[i]
        s1 = xn[i] *xn[i]  - xn2[i]*xn2[i]
        s2 = xn[i] *xn[i]  - xn1[i]*xn1[i]
        xnp[i] = 0.5(fn2*s2 - fn1*s1 + fn*s0) / (fn2*v2 - fn1*v1 + fn*v0)
        xn2[i] = xn1[i]
        xn1[i] =  xn[i]
        xn[i] = xnp[i]
        fn2 = fn1
        fn1 = fn
        n += 1
        println("   $n $xn $xn1")
        xy = [xy; xn]
    end
    fit[i] = xnp[i]
    plot!(xy[:,1], xy[:,2], w = 3, legend = false, marker = :rect )
end
ofDescent(spi, fit = [16.116.1], low = [-.1-.1], up = [.4.4])

Coming out of a very crappy starting point, it came in three steps! Good ... But all methods have a disadvantage - they converge to a local minimum. Now let's take Ackley's function for research.

$$ f (x, y) = - 20 \ exp \ left [-0.2 {\ sqrt {0.5 \ left (x ^ {2} + y ^ {2} \ right)}} \ right] - \ exp \ left [ 0.5 \ left (\ cos 2 \ pi x + \ cos 2 \ pi y \ right) \ right] + e + $ 20$

ekly(x) = -20exp(-0.2sqrt(0.5(x[1]*x[1]+x[2]*x[2]))) - exp(0.5(cospi(2x[1])+cospi(2x[2]))) + 20 + ℯ
# f(0,0) = 0, x_i ∈ [-5,5]
fun = ekly
plotter(surface, low = [-5-5], up = [55])

ofDescent(spi, fit = [1.41.4], low = [-.1-.1], up = [2.42.4])

Golden section method

Theory . Although the implementation is complex, the method sometimes shows itself well by jumping over local minima.

function interval(i, fit, st)
    d = 0.
    ab = zeros(2)
    fitc = copy(fit)
    ab[1] = fitc[i]
    Fa = fun(fitc)
    fitc[i] -= st
    Fdx = fun(fitc)
    fitc[i] += st
    if Fdx < Fa
        st = -st
    end
    fitc[i] += st
    ab[2] = fitc[i]
    Fb = fun(fitc)
    while Fb < Fa
        d = ab[1]
        ab[1] = ab[2]
        Fa = Fb
        fitc[i] += st
        ab[2] = fitc[i]
        Fb = fun(fitc)
        # println("----", Fb, " ", Fa)
    end
    if st < 0
        c = ab[2]
        ab[2] = d
        d = c
    end
    ab[1] = d
    ab
end
function goldsection(i, fit, ϵ)
    ϕ = Base.MathConstants.golden
    ab = interval(i, fit, 0.01)
    α = ϕ*ab[1] + (1-ϕ)*ab[2]
    β = ϕ*ab[2] + (1-ϕ)*ab[1]
    fit[i] = α
    Fa = fun(fit)
    fit[i] = β
    Fb = fun(fit)
    while abs(ab[2] - ab[1]) > ϕ
        if Fa < Fb
            ab[2] = β
            β = α
            Fb = Fa
            α = ϕ*ab[1] + (1-ϕ)*ab[2]
            fit[i] = α
            Fa = fun(fit)
        else
            ab[1] = α
            α = β
            Fa = Fb
            β = ϕ*ab[2] + (1-ϕ)*ab[1]
            fit[i] = β
            Fb = fun(fit)
        end
        println(">>>", i, ab)
        plot!(ab, w = 1, legend = false, marker = :rect )
    end
    fit[i] = 0.5(ab[1] + ab[2])
end
ofDescent(goldsection, fit = [1.41.4], low = [-.1-.1], up = [1.1.])

On this with the coordinate descent everything. The algorithms of the presented methods are quite simple, so implementing them in your preferred language is not difficult. In the future, you can consider the built-in tools of the Julia language , but so far I want to touch everything with hands, so to speak, consider methods more complicated and more efficient, then you can go on to global optimization, simultaneously comparing with implementation in another language.

Literature

Zaitsev V.V. Numerical methods for physicists. Nonlinear equations and optimization: a tutorial. - Samara, 2005 - 86s.
Ivanov A. V. Computer-aided methods for optimizing optical systems. Tutorial. –SPb: SPbSU ITMO, 2010 - 114с.
Popova TM. Methods of multidimensional optimization: guidelines and tasks for the implementation of laboratory work in the discipline "Methods of optimization" for students of the direction "Applied Mathematics" / comp. T. M. Popova. - Khabarovsk: Pacific Publishing House. state University, 2012. - 44 p.

Tags: