TILT algorithm or non-standard use of matrix rank

Today we look at the Transform Invariant Low-rank Texture (TILT) algorithm and its many application methods in Computer Vision. The article will be of a somewhat overview character, without a dense deepening into mathematical wilds.


The idea of ​​the algorithm can be said to arise from the Netflix Prize contest , which, in fact, is a collaborative filtering task, where we have a matrix of users and their ratings: the matrix is ​​sparse, there are no elements in it, and noise may also be present, but we need to each movie to predict what rating each user will give it, i.e. we fill in the matrix.

Image is also a matrix!
We will use the matrix rank definition
The rank of a system of rows (columns) of a matrix A with m rows and n columns is the maximum number of linearly independent rows (columns). Several rows (columns) are called linearly independent if none of them is expressed linearly through the others. The rank of the row system is always equal to the rank of the column system, and this number is called the rank of the matrix.
And what image will have a low rank? For example, an image where there are regular structures:



As shown in the picture above the post, we model our original matrix as a low rank matrix + a sparse matrix with noise, i.e. in fact, it may look something like this:



Now, about the TILT algorithm itself:
We add distortion to the initial formulation of the problem, i.e. we restore not only the low-rank matrix and the noise matrix, but also geometric distortion (for example, rotation, affine transformation, projective transformation) at which the matrix rank is minimized.

We will define our transformation by a homography matrix, although it is possible to specify a more complex transformation.
But in practice, they do not minimize the rank of the matrix directly, but takenuclear norm , and for the matrix with noise, L1 norm is taken, as I understand it, this is done for regularization, i.e. so that the matrix is ​​sparse.

Statement of the minimization problem:




Then the algorithm iteratively converges to the optimal solution using optimization methods, as shown in the video:



A small bonus example: reducing the dimension of the matrix via SVD: An
image with a regular structure was taken and a rectangular hole was made in it (the color of the pixels was set to 0)
rank 5
image
rank 15
image
rank 100
image
The code
clc, clear

k = 100; % rank

% A is the mxn matrix we want to decompose

A = im2double (rgb2gray (imread ('low_rank_img.jpg'))) ';

sz = size (A)

% make black hole

A (100: 100 + sz (1) / 8,100: 100 + sz (2) / 10) = 0;

kmax = min (size (A));

if (k> kmax)

k = kmax;

end

tic

% Compute SVD of A directly

[u0, S0, V0] = svd (A, 'econ');

A0 = U0 (:, 1: k) * S0 (1: k, 1: k) * V0 (:, 1: k) ';

rank (A0)% test if rank = k

toc

display (['SVD Error:' num2str (norm (A (:) - A0 (:)) / norm (A (:)))])

clear U0 S0 V0


Now we turn to the most interesting, namely to practical applications:

Auto-calibration of lens distortion:



Augmented reality:



Auto-rotation of text, signage and barcodes:



And yes, not only pictures with a repeating structure have a low rank, but symmetry also reduces the rank! It is also worth noting that if a person has a blindfold in one eye (a pirate man) or a bangs on one side, this will not interfere with the “find symmetry” algorithm, because we remember that the algorithm also simulates noise.

Auto-rotate faces, goats and any objects that have symmetry:



Auto-rotate car number (a little hello to recent posts about recognizing a car number from ZlodeiBaal images are taken from here ):





Another area in which this algorithm can be used is the Inpainting task , but the benefit is doubtful, because a regular structure is required and the details will be deleted.


Is the algorithm really so good ?
Not really. The convergence of the algorithm to the correct solution depends on the initialization, here are a couple of negative examples:



What next?
Link to the original Matlab code: TILT code
Link to C ++ code from Smorodov : TILT-Cpp-port and SelfCalibration Publication
links: TILT: Transform Invariant Low-rank Textures
Lecture links: Low-rank modeling |The Pursuit of Low-dimensional Structures in High-dimensional Data, Yi Ma, Microsoft

Also popular now: