A little about conical duality

When studying theoretical courses in machine learning (mat. Economics, optimization, finance, etc.), the concept of a “dual problem” is often encountered.

Dual tasks are often used to obtain lower (or upper) estimates for the objective functional in optimization problems. In addition, for almost any meaningful formulation of the optimization problem, the dual problem has a meaningful interpretation. That is, if you are faced with an important optimization task, then its dual approach is also most likely important.

In this article I will talk about conical duality. This way of constructing dual tasks, in my opinion, is unfairly deprived of attention ...

Further mat ...

How do dual tasks usually build?


Let some optimization problem be given:

$ \ min_ {x \ in R ^ n} f (x) \ f_i (x) \ leq 0, \ quad 1 \ leq i \ leq k \ h_i (x) = 0, 1 \ leq i \ leq m $



The dual task is constructed as follows:

  • Build Lagrangian

$ L (x, \ lambda, \ mu) = f (x) + \ sum_ {i = 1} ^ k \ lambda_i f_i (x) + \ sum_ {i = 1} ^ m \ mu_i h_i (x) $


  • Build dual function

$ g (\ lambda, \ mu) = \ inf_x L (x, \ lambda, \ mu) $


  • Get a dual task

$ \ max _ {\ lambda, \ mu} g (\ lambda, \ mu) \\ \ lambda \ geq 0 $



The main difficulty in this scheme is the protection in the search step. $ \ inf_x L (x, \ lambda, \ mu) $.

If the problem is not convex, then this is a coffin - in general it is not solved in polynomial time (if$ P \ neq NP $) and we will not discuss such tasks in this article in the future.

Suppose that the problem is convex, then what?

If the problem is smooth, then we can use the first-order optimality condition.$ \ nabla_x L (x, \ lambda, \ mu) = 0 $. From this condition, if everything is OK, it turns out to output or$ x (\ lambda, \ mu) = \ arg \ min_x L (x, \ lambda, \ mu) $ and $ g (\ lambda, \ mu) = L (x (\ lambda, \ mu), \ lambda, \ mu) $ or directly function $ g (\ lambda, \ mu) $.

If the problem is not smooth, then we could use an analogue of the first order condition$ 0 \ in \ partial_x L (x, \ lambda, \ mu) $ (here $ \ partial_x L (x, \ lambda, \ mu) $ denotes the subdifferential of the function $ L (x, \ lambda, \ mu) $), however, this procedure is usually much more complicated.

Sometimes there is an equivalent “smooth” optimization problem and one can construct a dual one for it. But for the improvement of the structure (from non-smooth to smooth), as a rule, one always has to pay an increase in dimension.

Conical duality


There are quite a lot of optimization problems (examples below) that allow the following view:

$ \ min_ {x \ in R ^ n} c ^ Tx \\ Ax + b \ in K $


Where $ A $ - matrix, $ b $ - vector $ K $- nondegenerate convex cone.

In this case, the dual problem can be constructed according to the following scheme:

The dual problem is constructed according to the following scheme:

  • Build Lagrangian

$ L (x, \ lambda) = c ^ Tx + \ lambda ^ T (Ax + b) $


  • Build dual function

$ g (\ lambda) = \ inf_x L (x, \ lambda) = \ begin {cases} \ lambda ^ T b, \ quad c + A ^ T \ lambda = 0 \\ - \ infty, \ quad c + A ^ T \ lambda \ neq 0 \ end {cases} $


  • Get a dual task

$ \ max _ {\ lambda} b ^ T \ lambda \\ c + A ^ T \ lambda = 0 \\ - \ lambda \ in K ^ * $


where is the conjugate cone $ K ^ * $ for cone $ K $ defined as $ K ^ * = \ left \ {y \ in R ^ k |  z ^ T y \ geq 0, \ quad \ forall z \ in K \ right \} $.

As we see, the entire complexity of the construction of the dual problem was transferred to the construction of the dual cone. But the joy is that there is a good calculus for building dual cones and very often the dual cone can be written out right away.

Example


Suppose we need to build a dual optimization problem for the problem:

$ \ min_ {x \ in R ^ n} \ | x \ | _2 + \ | x \ | _1 \\ Ax \ geq b $



Here $ \ | x \ | _1 = \ sum_ {i = 1} ^ n | x_i | $, $ \ | x \ | _2 = \ sqrt {\ sum_ {i = 1} ^ n x_i ^ 2} $

The first thing to notice: the objective function can always be made linear!

Rather, there is always an equivalent problem with a linear objective function:

$ \ min_ {x \ in R ^ n, y \ in R, z \ in R} y + z \\ | x \ | _2 \ leq y \\ | x \ | _1 \ leq z \ Ax \ geq b $



Now you need to use a little secret knowledge: sets

$ K_1 = \ {(x, t) \ in R ^ n \ times R |  \ quad \ | x \ | _1 \ leq t \} $

and

$ K_2 = \ {(x, t) \ in R ^ n \ times R |  \ quad \ | x \ | _2 \ leq t \} $

are convex cones.

Thus, we arrive at an equivalent task entry:

$ \ min_ {x \ in R ^ n, y \ in R, z \ in R} y + z \\ I_ {n + 1} \ begin {pmatrix} x \\ y \ end {pmatrix} + 0_ {n +1} \ in K_2 \\ I_ {n + 1} \ begin {pmatrix} x \\ z \ end {pmatrix} + 0_ {n + 1} \ in K_1 \\ Ax-b \ in R _ + ^ k $



Now we can immediately write out the dual task:

$ \ max _ {\ lambda, \ mu, \ nu} -b ^ T \ nu \\ \ lambda_i + \ mu_i + [A ^ T \ nu] _i = 0, \ quad 1 \ leq i \ leq n \\ \ lambda_ { n + 1} + 1 = 0 \\ \ mu_ {n + 1} +1 = 0 \\ - \ lambda \ in K_2 ^ * (= K_2) \\ - \ mu \ in K_1 ^ * (= K _ {\ \ infty}) \\ - \ nu \ in R ^ k _ + $


or if to simplify a little

$ \ max _ {\ lambda, \ mu, \ nu} -b ^ T \ nu \\ \ lambda + \ mu + A ^ T \ nu = 0 \\ \ | \ lambda \ | _2 \ leq 1 \\ \ | \ mu \ | _ {\ infty} \ leq 1 \\ - \ nu \ in R ^ k _ + $



Where $ \ | \ mu \ | _ {\ infty} = \ max_ {i} | \ mu_i | $.

Links for further study:


Also popular now: