The new generation of AV1 codec: corrective directional filter CDEF

Transfer

Posted by: Monty (monty@xiph.org, cmontgomery@mozilla.com). Published June 28, 2018.

If someone has not read the previous article ... AV1 is a new universal video codec developed by the Alliance for Open Media. The Alliance took VPX codec from Google, Thor from Cisco and Daala from Mozilla / Xiph.Org as a basis. The AV1 codec is superior in performance to VP9 and HEVC, which makes it a codec not for tomorrow, but for the day after tomorrow. The AV1 format is free from any royalty and will always remain so with a licensing license.

This article was conceived as the second in a series of articles that describe in detail the functionality of AV1 and new technologies that underlie it and are used for the first time in production. Previous article on Xiph.org explainedChroma from Luma (CfL) brightness prediction function . Today we will talk about the limited directional corrective filter (Constrained Directional Enhancement Filter). If you always wondered what you need to write a codec, fasten your seat belts and get ready for education!

Filters in AV1

Virtually all video codecs apply corrective filters to improve the subjective quality of the material at the output.

By “corrective filters” I mean methods that do not necessarily encode information about the image or improve the objective coding efficiency, but they somehow improve the result. Corrective filters should be used carefully, because they usually lose some information - and for this reason they are sometimes considered an unacceptable method of deception, so that the result looks better than it actually is.

But this is not fair. Corrective filters are designed to circumvent or eliminate specific artifacts that are imperceptible to objective metrics, but obvious to the human eye. And even if you consider the filters as a form of deception, a good video codec still needs all the practical and effective cheats that it can use.

Filters are divided into several categories. First, they can be normative or non-normative . Regulatory filter - an obligatory part of the codec; Without it, it is impossible to decode the video correctly. Non-standard filter is optional.

Secondly, the filters differ in the place of application. There are pre-processing filters (preprocessing) applied to the input data before the start of encoding, filterspost-processing (post -processing ) applied to the output data after decoding is completed, as well as loop filters (in-loop filters, loop filters) integrated into the encoding process. Pre-processing and post-processing filters are usually non-standard and will not be included in the codec. And contour filters are normative almost by definition. This is part of the codec; they are used in the encoding optimization process and are applied to reference frames or for interframe coding.

AV1 uses three normative corrective filters. The first is a deblock filter to remove blockiness — obvious artifacts along the edges of coded blocks. Although DCT is relatively well suited to condense energy in natural images, it still tends to accumulate errors at the edges of the blocks. If you remember, eliminating these artifacts was the main reason why a lapped transform was applied in Daala . But AV1 is a more traditional codec with hard block boundaries. As a result, a traditional deblock filter is required here to smooth out artifacts at the edges of the blocks.

An example of artifacts at the block boundaries in the traditional DCT block codec. These errors are especially noticeable.

The last of the three filters is the Loop Restoration filter. It consists of two configurable and replaceable filters: the Wiener filter and the self-guided filter. These are two convolutional filters that try to build a kernel to partially restore the lost quality of the original input image. They are usually used for noise reduction and / or correction at the edges of the blocks. In the case of AV1, they perform the overall task of noise reduction, removing the base DCT noise through adjustable blur.

Between them there is a limited directional correction filter.(Constrained Directional Enhancement Filter, CDEF), which we will talk about. Like the edge recovery filter, it removes artifacts and base noise at the joints, but unlike the edge recovery filter, it is a directional filter. Unlike most filters, it does not apply to everything in a row, but specifically finds the boundaries of the blocks. Therefore, CDEF is especially interesting: it is the first practical and useful directional filter used in video coding.

Long and winding road

The history of CDEF has not been easy. This is a long road with turns, side paths and dead ends. The CDEF combines several research papers, each of which gave an idea or inspiration for the final filter in AV1.

The whole point of converting blocks of pixel data using DCT and DCT-like transformations is to represent a block of pixels as few numbers as possible. DCT compresses energy fairly well in most images, that is, it tends to collect scattered pixel patterns in just a few important output factors.

But there are exceptions to the efficiency of DCT compression. For example, DCT does not transform directional boundaries or patterns very well. If you look at the DCT output of the sharp diagonal edge, then the output coefficients also form ... an acute diagonal! It is different after conversion, but still present in the image, although usually in a more complex form than at the beginning. Compression defeated!

Sharp borders are a traditional problem for DCT-based codecs, because they do not compress well, if at all. Here we see a sharp border (left) and DCT transform coefficients (right). The energy of the original boundary propagates through the DCT in a pattern of directed ripples

Over the past two decades, video codec research has increasingly considered conversions, filters, and prediction methods that are inherently directional. Researchers were looking for a way to better handle these boundaries and patterns in order to correct this fundamental limitation of DCT.

Classic Directional Predictors

Directed intra-prediction is probably one of the most well-known methods of directional action in modern video codecs. Everyone is familiar with the directional prediction modes h.264 and VP9, where the codec transfers the prediction of a particular pattern to a new block based on the surrounding pixels from already decoded blocks. The goal is to remove (or significantly reduce) energy in hard, directional edges before transforming the block. By predicting and removing features that cannot be compressed, we increase the overall efficiency of the codec.

Illustration of directional prediction modes in AVC / H.264 for 4 × 4 blocks. The predictor expands the values from single-pixel strips of neighboring pixels, transferring them to the predicted block in one of eight directions, plus averaging mode for simple DC prediction. An

even older idea is motion compensation. This is also a form of directional forecasting, although we rarely think about it that way. This mode shifts blocks in certain directions, again to predict and extract energy before running DCT. This block offset is directional and filtered. Like directed intra-prediction, it applies carefully constructed resampling filters if the offset is not an integer number of pixels.

Directional Filters

As noted earlier, video codecs actively apply filters to remove block artifacts and noise. Although filters are applied on the 2D plane, the filters themselves usually work in 1D, that is, they are performed horizontally and vertically separately.

Directional filtering triggers filters in different directions than horizontal and vertical. This method is already common in static image processing, where noise reduction and special effects filters often take into account boundaries and direction. But these directional filters are often based on filtering the output of directional conversions. For example, the [slightly obsolete] noise cancellation filters I wrote about are based on a dual tree of complex wavelets .

But for video encoding, we are most interested in directional filters, which are directly applied to pixels in a certain direction, rather than filtering the frequency domain at the output of the directional transform. As soon as you try to create such an animal, the Big Question will quickly arise: how to formalize a certain direction, different from horizontal and vertical, when the filter positions are no longer tied to the pixels on the grid?

One option is to use the classic approach used in high-quality image processing: convert the filter core and re-sample (resample) the pixel space as needed. It can even be argued that this is the only "correct" or "complete" answer. It is used in compensation for sub-pixel motion, where it is impossible to obtain a good result without good resampling, as well as in directional prediction, where fast approximation is usually used.

However, even a quick approximation is expensive in terms of computing resources, if applied everywhere, it is therefore advisable to avoid re-sampling, if possible. The expensive price for speed is one of the reasons why directional filters have not yet been used in video coding.

Directional transforms

Directional transformations attempt to correct DCT problems by condensing the block boundaries at the level of the transformation itself.

Experiments in this area fall into two categories. There are transformations that use essentially directed bases, such as directional wavelets. As a rule, they are prone to excessive recalculation / overcomplete, that is, they produce more output data than they accept at the input: usually much more. It's like working in the opposite direction, because we want to reducethe amount of data and not increase it! But these transformations still compress the energy, and the encoder still selects a small subset of the output data for encoding, so in reality there are some differences from conventional DCT coding with losses. However, “overcomplete” conversions usually require an excessive amount of memory and computational resources, and therefore are not used in popular video codecs.

The second category of directional transformations takes regular undirectional transformations like DCT - and changes them, affecting the input or output. Changes can be made in the form of resampling, matrix multiplication (which can be considered as a specialized form of resampling) or juggling with the order of the input data.

This last idea is the strongest because the method works quickly. A simple permutation of numbers does not require any mathematical calculations.

Two examples of transformations in different directions by permuting pixels and coefficients, rather than a resampling filter. The example is taken from “Review of Directed Transformations in Image Coding” , Jicheng Xu, Bing Zeng, Feng Wu

The implementation is complicated by several practical difficulties. Reorienting the square gives a diagonal edge with mostly vertical or horizontal lines, which leads to a non-square matrix of numbers as input. Conceptually, this is not a problem. Since it is possible to start the conversion of rows and columns independently of each other, we simply use different sizes of 1D DCT for each row and column, as shown in the figure above. But in practice this means that we will need a different DCT factorization for each possible column length - and as soon as the hardware design department of this understands, you will be thrown out of the window.

There are other ways of processing non-square regions after the permutation. You can think of resampling schemes that retain the input square or work only on output. MostThe following articles on directional conversion suggest different schemes for this.

And on this the story of directed transformations essentially ends. As soon as you bypass various complications of directional transformations and make a real filter, it does not work normally in a modern codec for an unexpected reason: due to competition with a variable block size. That is, in a codec with a fixed block size, the addition of directional transformations gives an impressive increase in efficiency. And the variable block size in itself gives even greater benefits. But the combination of variable block size and directional transformations leads to a worse result than using each of these methods separately. The variable block size has already eliminated the redundancies that are used by directional transformations, and even made it more efficient.

When developing Daala, Nathan Egge and I experimented a lot with directional transformations. I looked at the problem from both the input and the output side, using sparse matrix multiplications to transform the output diagonal boundaries into a vertical / horizontal position. Nathan tested the well-known approaches to directional transformations, rearranging the data at the input. We came to one conclusion: additional complexity does not provide any objective or subjective benefits.

Applying directional transformations to Daala (and other codecs) could be a mistake, but the research raised one issue mentioned earlier: how to quickly filter along borders without expensive re-sampling? Answer: no need to re-sample. Make an angle approximation, moving along the whole nearest pixel. Make an approximation of the transformed core, literally or conceptually rearranging the pixels. This approach leads to some distortions (aliasing), but it works quite well and quickly enough .

Directed Predictors, Part 2: Daala Chronicles

The history of the CDEF in the Daala codec began with attempts to do something completely different: the usual boring directed intra-prediction. Or at least something normal for the Daala codec.

I wrote about the Daala intra-forecasting scheme in the frequency domain when we first started working on it. The math is quite working here; no cause for concern. However, the naive implementation required a huge number of matrix multiplications, which turned out to be too expensive for the codec in production. We hoped that the computational load can be reduced by an order of magnitude due to thinning - the elimination of matrix elements that do not make a large contribution to the forecast.

But thinning does not work as desired. At least when we implemented it, it lost too much information, making the technique unusable in practice.

Of course, Daala still needed some form of intra-prediction, and Jean-Marc Valin had an idea: an autonomous prediction codec that works in the spatial domain parallel to the Daala codec in the frequency domain. As a kind of symbiote working in tandem, but not dependent on Daala, it is not limited to the requirements of the Daala for the frequency domain. This is how Intra Paint appeared . An example of the Intra Paint prediction algorithm in a photograph of Sydney Harbor

. The visual output is clearly directed, fits well with the block boundaries and features of the original image, creating a pleasant (perhaps a bit strange) result with clear boundaries

Intra Paint filter worked in a new way: it encoded one-dimensional vectors only along the block boundaries and then drove the pattern in the chosen direction . It's like splashing paint, and then smearing it in different directions in open areas.

Intra Paint seemed promising and in itself produced amazingly beautiful results, but again it was not effective enough to work as a standard intra-predictor. He simply did not get enough bits to encode his own information.

The difference between the original photo of Sydney Harbor and the result of Intra Paint. Despite the visually pleasing output of Intra Paint, objectively, it cannot be called an ultra-accurate predictor. The difference is quite significant even along many borders that seemed to be well crafted.

The “failure” of Intra Paint gave us a different idea. Although this “drawing” is objectively not a very good predictor, but subjectively, for the most part, it looks good. Could it be possible to use the “paint smear” method as a post-processing filter to improve the subjective visual quality? Intra Paint follows very well along clear boundaries, and therefore should potentially well eliminate noise that accumulates along the sharpest edges. From this idea was born the original Paint-Dering filter in Daala, which ultimately led to the CDEF itself.

There is one more interesting thing in directional forecasting, although this is currently a dead end direction in video coding. David Schlieff implemented an interesting filter of re-sampling taking into account the boundaries / directions calledEdge-Directed Interpolation (EDI). Other codecs (like the VPx series and at some time AV1) experimented with reduced-size reference frames to save coding bits, and then increasing the resolution. We hoped that by increasing the resolution, the significantly improved EDI interpolation would improve the technique to such an extent that it would be useful. We also hoped to use EDI as an improved sub-pixel interpolation filter for motion compensation. Unfortunately, these ideas remained an unrealized dream.

Filling holes, merging branches

At the moment I have described all the basic prerequisites necessary for the approach to the CDEF, but in reality we continued to wander in the desert. Intra Paint spawned the original Daala Paint-Dering filter, which used the Intra-Paint algorithm as a post-filter to eliminate artifacts. It was too slow to use in a real codec.

As a result, we took into account the lessons of Intra Paint and abandoned experiments in this direction. Daala borrowed CLPF from Thor for a while, and then Jean-Marc created another, much faster Noise Reduction Filter (Deringing) for Daala, based on searching for boundary directions for Intra-Paint (it worked quickly and well), as well as Conditional Replacement Filterconditional replacement filter. The CRF was created in some way on the ideas of the median filter and produced results similar to a bilateral filter, but it worked essentially in vectors and therefore much faster. Comparison of a 7-tap linear filter with a conditional replacement filter on a noisy one-dimensional signal, where noise simulates the effects of quantization on the original signal. The final Daala noise suppression filter uses two one-dimensional CRF filters, a 7-tap filter in the direction of the edges and a weak 5-tap filter. Both filters operate on whole pixels only, without re-sampling. Here, the Daala noise suppression filter becomes very similar to what we now know as CDEF.

We recently offered Daala as an AOM codec, and this intermediate filter was an AV1 daala_dering experiment. Cisco also introduced its noise suppression filter, the Constrained Low-Pass Filter (CLPF) from the Thor codec. For a time, both filters existed in parallel in the experimental assembly of AV1: they could be turned on separately or even together. Due to this, they managed to notice useful synergies in their work, as well as additional similarities of filters at different stages of work.

So, we finally got to CDEF : there was a merge of the CLPF filter from Cisco and the second version of the Daala noise elimination filter in one high-performance noise reduction filter, taking into account the direction of the borders.

Modern CDEF

The CDEF filter is simple and very similar to our previous filters. It consists of three parts (direction search, limited replacement / low pass filter and placement of pixel tags) that we used before. Given the long background, looking at the finished CDEF, you might ask, “Is that all? And where is the rest? ”CDEF is an example of how to get a useful effect due to the correct implementation of parts, and not due to complication. A simple and effective filter - so it should be.

Direction Search

CDEF works in a certain direction, so you need to define it. The algorithm is the same as that of Intra Paint and Paint-Dering. There are eight possible directions. Eight possible directions for the CDEF filter. The numbered lines in each direction block correspond to the 'k' parameter in the search direction.

We determine the direction of the filter, making the “directional” variants of the input block, one for each direction, where all the pixels along the line in the selected direction are reduced to the same value. Then choose the direction where the result most closely matches the source block. That is, for each direction d, we first find the average value of the pixels in each line k, and then along each line we add the quadratic error between the specified pixel value and the average value of this pixel line.

An example of the process of choosing the direction d, which best corresponds to the input block. First, we determine the average pixel value for each row of operation k in each direction. This is shown above by reducing each pixel of a given k line to this average value. Then we summarize the error for a given direction, pixel by pixel, subtracting the input value from the mean. The direction with the least error / variance is chosen as the best direction.

So we get a common quadratic error, and the smallest common quadratic error is the direction we choose. Although in the example above it is done this way, but in reality it is not necessary to convert the quadratic error into dispersion: in each direction the same number of pixels, therefore, both will choose the same answer. Less calculations!

This is an intuitive, long way to calculate directional errors. You can simplify machining with the following equation:

In this formula

- mistake,

- pixel,

- pixel value

- one of the numbered lines in the direction diagram above, and

- the power of the set (the number of pixels in it) line

for directions

. This equation can be simplified in practice. For example, the first term is the same for each

. Ultimately, the AV1 CDEF implementation currently requires 5.875 additions and 1.9375 multiplications per pixel and is deeply vectorized, resulting in a total cost of less than 8 × 8 DCT.

Filter taps

The CDEF filter runs pixel-by-pixel in a full block. Direction

selects one of the direction filters, each of which consists of a set of filter labels (that is, locations of input pixels) and weights.

Conceptually, CDEF constructs a two-dimensional directional filter. The main filter works along the selected direction, as in the Daala noise elimination filter. The secondary filter runs twice at a 45 ° angle to the main one, as in CLPF Thor.

The direction of the primary and secondary one-dimensional filters with respect to the selected direction d. The main filter starts in the selected direction, the secondary filters operate at an angle of 45 ° to the main direction. Each pixel in a block is filtered the same.

Filters operate at such angles that ideal marks often fall between pixels. Instead of re-sampling, we select the exact location of the nearest pixel, taking into account the symmetric core of the filter.

Each tag in the filter also has the same weight. During filtering, its input value is fed to the input, then the constraint function is applied, then the result is multiplied by a fixed label weight, and this output value is added to the filtered pixel by the total.

The locations of the primary and secondary labels (taps) and fixed weights (w) in filter directions. For primary labels and even values of strength, a = 2 and b = 4, whereas for odd ones, a = 3 and b = 3. The filtered pixel is shown in gray.

In practice, the primary and secondary filters do not work separately, but are combined into one filter core, which is performed in one step.

Function restrictions

CDEF uses a limited low-pass filter in which the value of each label is first processed by the constraint function with the difference parameter between the value of the label and the filtered pixel.

filter strength

and damping parameter

The function of restrictions is designed to reduce or completely abandon the processing of pixels that are not too different from the filtered pixels. In the defined range, the differences in the label values from the central pixel are fully taken into account (which is given by the force parameter

). If the difference in values falls within the range between the parameters

and

then the value is weakened. Finally, tag values are outside

completely ignored. Illustration of the function of restrictions. In both figures, the difference (d) between the central pixel and the tag pixel is plotted along the x axis. The output of the constraint function is shown along the y axis. The figure on the left shows the effect of changing the force parameter (S). The figure on the right shows the effect of changing the damping parameter (D). The output value of the constraint function is then multiplied by a fixed weight associated with each position of the label relative to the central pixel. Finally, the resulting values (one for each label) are added to the central filtered pixel, which gives the final filtered pixel value. All this is expressed in the following formula: ... where

and

correspond to the values of the primary and secondary label sets.

There are several additional implementation details regarding rounding and clipping. But for understanding, they are not so important. If you are going to implement CDEF, of course, you can find them in the full description of CDEF .

results

CDEF is designed to remove or reduce the main noise and ringing artifacts near the sharp edges of the image without blurring or damage. The filter is now used in AV1 and gives a subtle, but consistent effect. Perhaps in the future it will be possible to rely even more on CDEF.

An example of reducing noise and artifacts when encoding an image of Fruits . The first inset shows the area without CDEF processing; the second, the same area after CDEF processing. The

quantitative value of any correction filter shoulddetermined by subjective tests. Objective metrics should not be discounted, but the development of CDEF is designed to produce a result that goes beyond the capabilities of primitive objective testing tools such as PSNR and SSIM.

Thus, we conducted several rounds of subjective testing, first during the development of the CDEF (when Daala Dering and Thor CLPF were still technically competitors), and then conducted a more complete test of the combined CDEF filter. Since this is a new filter, which is completely absent in previous generations of codecs, testing mainly consisted of AV1 with CDEF turned on and AV1 without CDEF.

Subjective A / B testing of AV1 with CDEF and AV1 without CDEF for high latency configuration

Subjective results show a statistically significant (p <.05) improvement in three of the six clips. As a rule, this corresponds to an increase in coding efficiency of 5–10%. This is a pretty big win for one tool added to a quite mature codec.

As expected, objective testing showed a more modest increase of about 1%. However, objective testing is useful only to the extent that it is consistent with subjective results. Subjective testing is the gold standard, and subjective results are quite unambiguous.

Testing also showed that CDEF behaves better if the additional “toolkit” of the codec is disabled during encoding. Like directional conversions, CDEF competes for profit with other more complex AV1 methods. Since CDEF is a simple, small and fast filter, in the future it can be used to reduce the complexity of AV1 encoders. In terms of decoder complexity, CDEF accounts for between 3% and 10% of the AV1 decoder, depending on the configuration.

Additional resources

Standard test suites from ' Xiph.Org', on media.xiph.org
Automated testing system and metrics used in the development of Daala and AV1: “Are we already compressed?”
Constrained Directional Enhancement Filter (CDEF) filter in AV1 . Steinar Midtskogen, Jean-Marc Valin, October 2017
Slides of the CDEF presentation for ICASSP 2018 , Steinar Midtskogen, Jean-Marc Valin
Filtering Deringing for Daala and further development . Jean-Marc Valin. The previous noise suppression filter created during the development of the Daala codec, which was used to create the CDEF in AV1.
Daala: coloring pictures for fun and goodness . Jean-Marc Valin. An even earlier corrective filter on the base Intra-Paint, which led to the creation of a noise reduction filter in Daala, which in turn led to CDEF
Intra Paint Deringing Filter . Jean-Marc Valin, 2015. Noise Correction Filter Notes, which appeared from the Intra Paint experiment at Daala
Managed image filtering . Kaiming He, Jian San, Xiao Tang, 2013
Directional adaptive discrete wavelet transform for image compression . Chuo-Ling Chang, Bernd Girod, IEEE Transactions on Image Processing, Volume 16, Number 5, May 2007
Directional-adaptive transformations for the exchange of images . Chuo-Ling Chang, Ph.D. thesis at Stanford University, 2009. This thesis provides a good overview of the area of directional change as of 2009; Unfortunately, no copies were available online.
Directionally-adaptive block-wise transformation for coding a color image . Chuo-Ling Chang, Mina Makar, Sam S. Tsai, Bernd Girod, IEEE Transactions on Image Processing, Volume 19, Number 7, July 2010
Template-based DCT scheme with DC prediction and coding in adaptive mode . Zhibo Chen, Xiaozhong Xu. Article for IEEE payvol
Directional adaptive transformations for residual coding prediction . Robert Cohen, Sven Klomp, Anthony Vetro, Huifan San. Proceedings of 2010 IEEE 17th International Conference on Image Processing, September 26-29, 2010, Hong Kong
Orientation-selective orthogonal transform with overlap . Dietmar Kunz, 2008. Article after the IEEE paywall
Analysis of the rate-distortion of the directional waveforms . Ariane Maleki, Boshra Rajai, Hamid Reza Purreza. IEEE Transactions on Image Processing, Volume 21, Number 2, February 2012
Theoretical analysis of the moments of change in trends in directional orthogonal transforms . Shogo Murumatsu, Dandan Khan, Tomoya Kobayashi, Hisakazu Kikuchi. Article for IEEE payvol. However, the short version is freely available.
Overview of directional transformations in image coding . Jicheng Xu, Bing Zeng, Feng Wu.
Conversion with directional filtering for image compression and interframe information . Xiliang Peng, Jicheng Xu, Feng Wu, IEEE Transaction in Image Processing, Volume 19, Number 11, November 2010. Article for IEEE Payvol
Approximation and compression using sparse orthonormal transformations . O. G. Sezer, O. G. Guleruz, Yusel Altunbasak, 2008
Reliable study of two-dimensional shared conversions for next-generation video codecs . O. G. Sezer, R. Cohen, A. Vetro, March 2011
Joint optimization based on the sparsity of a set of orthonormal two-dimensional partial block transforms . Joel Sole, Peng Yin, Yunfei Zheng, Christina Gomila, 2009. Article after the IEEE paywall
Overlaid directional transforms for image coding . Jicheng Xu, Feng Wu, Jie Liang, Wenjun Zhang, IEEE Transactions on Image Processing, April 2008
Directed discrete cosine transforms - a new framework for encoding images . Bing Zeng, Jingjing Fu, IEEE Transactions on Circuits and Systems for Video Technology, April 2008
Dual tree complex wavelet transformations . Ivan Seleznik, Richard Baranyuk, Nick Kingsbury, IEEE Signal Processing Magazine, November 2005

Tags: