Unity: compressing compressed

Published on December 02, 2016

Unity: compressing compressed


    Result: color information takes 1/64 of the original area with a fairly high quality result. Test image taken from this site .

    Textures are almost always the most significant consumer of space both on disk and in RAM. Compressing textures into one of the supported formats relatively helps in solving this problem, but what if even in this case there are a lot of textures, but you want even more?

    The story began about a year and a half ago, when a game designer (let's call him Akkelman) as a result of experiments with different modes of blending layers in photoshop discovered the following: if you discolor the texture and overlay the same texture in color, but 2-4 times smaller with the layer blending mode set to “Color”, the picture will look pretty much like the original.

    Data Storage Features


    What is the meaning of this separation? Black-and-white images containing essentially the brightness of the original image (hereinafter referred to as “grayscales”, from the English “grayscale”) contain only intensity and can be saved in one color plane each. That is, in a regular picture without transparency, having 3 color channels R, G, B, we can save 3 of these “grayscales” without losing space. You can use 4 channel - A (transparency), but there are big problems with it on mobile devices (on android with gles2 there is no universal format that supports compression of RGBA textures, the quality of compression deteriorates greatly, etc.), so only 3 will be considered for universality -channel solution. If this is realized, then we will get almost 3-fold compression (+ a disproportionately small “color” texture) for already compressed textures.

    Feasibility assessment


    You can roughly estimate the benefits of using such a solution. Suppose we have a 3x3 field of textures with a resolution of 2048x2048 without transparency, each of which is compressed in DXT1 / ETC1 / PVRTC4 and has a size of 2.7MB (16MB without compression). The total size of the occupied memory is 9 * 2.7MB = 24.3MB. If we can extract the color from each texture, reduce the size of this “color” map to 256x256 and size 0.043MB (it looks pretty tolerable, that is, it’s enough to store 1/64 of the total texture area), and pack the full-sized grayscales 3 pieces into new textures, we get an approximate size: 0.043MB * 9 + 3 * 2.7MB = 8.5MB (estimated size, with rounding up). Thus, we can get compression 2.8 times - it sounds pretty good, given the limited hardware capabilities of mobile devices and the unlimited desires of designers / content designers. You can either greatly reduce resource consumption and load time, or throw more content.

    First try


    Well, try. A quick search returned a ready-made algorithm / implementation of the “Color” mixing method. After studying its sources, the hair stirred throughout the body: about 40 “brunches” (conditional branches that adversely affect performance on the not-so-top-end hardware), 160 alu instructions and 2 texture samples. Such computational complexity is quite a lot not only for mobile devices, but also for the desktop, that is, it is completely unsuitable for real time. This was told to the designer and the topic was safely closed / forgotten.

    Second attempt


    A couple of days ago, this topic surfaced again, it was decided to give her a second chance. We don’t need to get 100% compatibility with the implementation of photoshop (we don’t have a goal to mix several textures into several layers), we need a faster solution with a visually similar result. The basic implementation looked like a double round-trip conversion between RGB / HSL spaces with calculations between them. Refactoring led to the fact that shader complexity dropped to 50 alu and 9 brunches, which was already at least 2 times faster, but still not enough. After requesting help from the audience, comrade wowaaagave an idea how to rewrite a piece that generates “brunching” without conditions, for which many thanks to him. Part of the conditional calculations was made in a lookup texture, which was generated by a script in the editor and then simply used in a shader. As a result of all the optimizations, the complexity dropped to 17 alu, 3 texture samples and the absence of “brunching”.
    It seems like a victory, but not quite. Firstly, such complexity is still excessive for mobile devices, you need at least 2 times less. Secondly, all this was tested on contrasting pictures filled with solid color. Example of artifacts (clickable): left erroneous, right - reference options




    After tests on real pictures with gradients and other delights (nature photos), it turned out that this implementation is very capricious for combining the resolution of the “color” map with mipmaps and filtering settings: obvious artifacts appeared due to mixing of these textures in the shader and rounding errors / compression of the textures themselves. Yes, it was possible to use textures without compression, with POINT filtering and without drastically reducing the size of the “color map”, but then this experiment lost all meaning.

    Third attempt


    And here the next help of the hall helped. Comrade Belfegnar , who loves “graphonium, nekstgen, that's all” and reads all available research on this topic, suggested another color space - YCbCr and rolled out corrections to my testbed that support it. As a result, the shader complexity immediately fell to 8 alu, without “brunching” and lookup textures. Also, links to research with the formulas of any brainy mathematicians who checked different color spaces for the possibility / expediency of their existence were dropped. Of these, options for RDgDb, LDgEb, YCoCg were collected (you can “google”, there is only the last one, the first 2 can be found at the links: sun.aei.polsl.pl/~rstaros/index.html , sun.aei.polsl.pl /~rstaros/papers/s2014-jvcir-AAM.pdf) RDgDb and LDgEb are based on the same base channel (used as a full-sized grayscale) and the ratio of the two remaining channels to it. A person does not perceive the difference in color well, but determines the difference in brightness quite well. That is, with strong compression of the “color” card, not only color was lost, but also contrast - the quality suffered greatly.

    As a result, YCoCg “won” - the data is based on brightness, they tolerate the compression of the “color” card well (they “wash” stronger than YCbCr under strong compression - the “picture” retains better contrast), the shader complexity is less than that of YCbCr.


    “Color” map - packed data (CoCg) is contained in RG channels, the B channel is empty (can be used for user data).

    After the basic implementation, dances with a tambourine started again for the sake of optimization, but in this I did not greatly succeed.

    Total



    Once again, the picture with the result: the resolution of the color texture can be changed over a wide range without noticeable loss in quality.

    The experiment was quite successful: a shader (without transparency support) with a complexity of 6 alu and 2 texture samples, 2.8x memory compression. In each material, you can specify a color channel from the grayscale atlas, which will be used as brightness. Similarly, for a shader with transparency support, select the grayscale color atlas channel for use as alpha.

    Sources: Github
    License: MIT license.

    All characters are fictitious and any coincidence with people who actually live or ever lived by chance. Not a single designer was injured during this experiment.