Birth of a new algorithm named Broo and comparison with Brotli and others

Good day to the residents and visitors of the site, we will talk about the lossless compression algorithm , which is a joint “child”. This article will present the intermediate results which were achieved, in the form of comparison tables with popular algorithms.

Briefly about the algorithm


The main ideology for the algorithm was composed in several characteristics:

  • Not attached to file types
  • Good compression ratio
  • Fast unpacking speed

The packing (compression) speed was not initially included here, but will gradually improve like the whole algorithm as a whole. Entropy compression and dictionary methods are not used.

Training


For the purity and simplicity of measurements, the algorithm was integrated into the lzbench utility (link to GitHub ), since it already has a sufficient number of other algorithms and it is easy to integrate your own.

Next, files were selected from an existing package for testing algorithms called Silesia. Short file description:
File nameDescriptionA typeSize Byte
dickensCollection of Charles Dickenstxt (eng)10 192 446
mozillatar archive with Mozilla 1.0 executablesexe51 220 480
mrMRI imageimage9 970 564
nciBase chemical structuresdatabase33 553 445
oofficeDll files from Open Office.org 1.01exe6 152 192
osdbExample MySQL database format from database10 085 684
reymontBook text Polish, pdf6 627 202
sambatar source archive src21 606 400
saoStar Catalog of the Smithsonian Astrophysical Observatorybin data7 251 944
websterWebster's American English Dictionaryhtml41 458 703
xmlXml files collectionxml5 345 280
X rayX-ray imageimage8 474 240
Source

List of algorithms that participate in comparisons


  • Brotli
  • Broo
  • Csc
  • Gipfeli
  • Libdeflate
  • Lz4
  • Lz5
  • Lzlib
  • Lzma
  • Lzsse
  • Quicklz
  • Snappy
  • Xpack
  • Yalz77
  • Yappy
  • Zlib
  • Zstd

The list is full of algorithms that solve different problems, but everything is known in comparison.

A short description of some of them


Brotli - based on the modern version of the LZ77 algorithm , Huffman entropy coding and modeling of the 2nd order context.
Designed to speed up the loading of web pages, it is supported in Chrome browsers based on Chromium and in Firefox.

Deflate is a lossless compression algorithm using a combination of LZ77 and Huffman algorithms.

Zstandard (Zstd) is a lossless data compression algorithm developed since 2015 by Yann Collet with the support of Facebook. It combines a vocabulary data compression algorithm of the LZ77 type and efficient entropy coding of the tANS type (FSE - Finite State Entropy), a modification of the Huffman code that implements a non-integer number of bits for storing characters.

LZMA- used in the 7-Zip archiver to create compressed archives in 7z format. The algorithm is based on a dictionary data compression scheme similar to that used in LZ77, and provides a high compression ratio (usually higher than the compression ratio obtained using bzip2), and also allows the use of dictionaries of various sizes (up to 4 GB).

Snappy - another google lossless compression algorithm designed to achieve high speeds without reaching maximum compression ratios.

PC specifications


Processor DualCore Intel Core i3 550, 3200 MHz
Memory GoodRam 8119 Mb DDR3-1333 DDR3 SDRAM
OS Ubuntu 16.10 x64

results


memcpy - a function that copies data, taken as a benchmark for the speed of packing and unpacking.

Tables are sorted by compression ratio (“% of the original” lzbench displays, did not convert) from small to large.

Test 1. Collection of works by Charles Dickens, text
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
memcpy4029 MB / s4034 MB / s10192446100.00dickens txt
csc 2016-10-13 -118 MB / s31 MB / s402091639.45dickens txt
lzlib 1.7 -07.76 MB / s34 MB / s381533537.43dickens txt
lzma 9.38 -013 MB / s38 MB / s404485039.68dickens txt
libdeflate 0.6 -185 MB / s435 MB / s423154341.52dickens txt
zstd 1.1.3 -1143 MB / s486 MB / s427927341.98dickens txt
xpack 2016-06-02 -183 MB / s359 MB / s428224542.01dickens txt
brotli 0.5.2 -0168 MB / s178 MB / s4,401,26943.18dickens txt
zlib 1.2.8 -150 MB / s195 MB / s458561844.99dickens txt
broo 1.06.03 MB / s265 MB / s475093646.61dickens txt
gipfeli 2016-07-13178 MB / s254 MB / s495563248.62dickens txt
yalz77 2015-09-19 -162 MB / s304 MB / s563410955.28dickens txt
quicklz 1.5.0 -1250 MB / s326 MB / s583135357.21dickens txt
lzsse2 2016-05-14 -018 MB / s1481 MB / s586570557.55dickens txt
yappy 2014-03-22 -091 MB / s1122 MB / s614185360.26dickens txt
snappy 1.1.3179 MB / s648 MB / s633783462.18dickens txt
lz4 1.7.5264 MB / s1652 MB / s642874263.07dickens txt
lz5 2.0 -10216 MB / s1855 MB / s643186963.10dickens txt

Test 2. Tar archive with executable files Mozilla 1.0, exe
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
memcpy3986 MB / s4042 MB / s51220480100.00mozilla exe
csc 2016-10-13 -111 MB / s41 MB / s1533119129.93mozilla exe
lzma 9.38 -017 MB / s43 MB / s1642527207/32mozilla exe
lzlib 1.7 -018 MB / s33 MB / s1647048432.16mozilla exe
xpack 2016-06-02 -176 MB / s368 MB / s1839187435.91mozilla exe
libdeflate 0.6 -192 MB / s396 MB / s1978012438.62mozilla exe
zstd 1.1.3 -1209 MB / s542 MB / s2012045939.28mozilla exe
zlib 1.2.8 -1 53 MB / s209 MB / s2057722640.17mozilla exe
brotli 0.5.2 -0217 MB / s186 MB / s2174012842.44mozilla exe
broo 1.05.11 MB / s350 MB / s2317722045.25mozilla exe
gipfeli 2016-07-13236 MB / s436 MB / s2438055847.60mozilla exe
quicklz 1.5.0 -1315 MB / s368 MB / s2475681948.33mozilla exe
yalz77 2015-09-19 -149 MB / s436 MB / s2545453249.70mozilla exe
lzsse2 2016-05-14 -013 MB / s1493 MB / s2582664850.42mozilla exe
lz4 1.7.5437 MB / s1876 ​​MB / s2643566751.61mozilla exe
snappy 1.1.3303 MB / s1013 MB / s2646192451.66mozilla exe
lz5 2.0 -10334 MB / s2097 MB / s2701624252.74mozilla exe
yappy 2014-03-22 -0107 MB / s1749 MB / s2772821854.14mozilla exe

Test 3. Image MRI, image
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
lzlib 1.7 -020 MB / s34 MB / s3130897 31.40mr, image
lzma 9.38 -016 MB / s44 MB / s3157626 31.67mr, image
csc 2016-10-13 -112 MB / s40 MB / s3285805 32.96mr, image
xpack 2016-06-02 -182 MB / s323 MB / s3526828 35.37mr, image
libdeflate 0.6 -198 MB / s428 MB / s3750985 37.62mr, image
zlib 1.2.8 -160 MB / s227 MB / s3828366 38.40mr, image
zstd 1.1.3 -1191 MB / s637 MB / s3829231 38.41mr, image
brotli 0.5.2 -0198 MB / s185 MB / s3975643 39.87mr, image
gipfeli 2016-07-13220 MB / s395 MB / s4702561 47.16mr, image
broo 1.05.94 MB / s305 MB / s474121947.55mr, image
quicklz 1.5.0 -1410 MB / s363 MB / s4778194 47.92mr, image
lzsse2 2016-05-14 -024 MB / s1523 MB / s5120289 51.35mr, image
yalz77 2015-09-19 -158 MB / s396 MB / s5269368 52.85mr, image
snappy 1.1.3302 MB / s912 MB / s5419831 54.36mr, image
lz4 1.7.5422 MB / s2024 MB / s5440937 54.57mr, image
yappy 2014-03-22 -0108 MB / s1609 MB / s6454120 64.73mr, image
lz5 2.0 -10294 MB / s2248 MB / s6978486 69.99mr, image

Test 4. Base chemical structures, database
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
memcpy4042 MB / s4047 MB ​​/ s33553445100.00nci db
csc 2016-10-13 -139 MB / s156 MB / s24637737.34nci db
lzma 9.38 -043 MB / s153 MB / s27779978.28nci db
lzlib 1.7 -049 MB / s103 MB / s28687618.55nci db
zstd 1.1.3 -1435 MB / s915 MB / s28845308.60nci db
broo 1.08.65 MB / s1000 MB / s29819708.89nci db
xpack 2016-06-02 -1180 MB / s807 MB / s383884711.44nci db
brotli 0.5.2 -0539 MB / s575 MB / s398419911.87nci db
libdeflate 0.6 -1180 MB / s1165 MB / s406691312.12nci db
zlib 1.2.8 -1122 MB / s404 MB / s462459713.78nci db
yalz77 2015-09-19 -1197 MB / s695 MB / s505059615.05nci db
gipfeli 2016-07-13529 MB / s681 MB / s506382909/15nci db
lz4 1.7.5765 MB / s2496 MB / s553304016.49nci db
lz5 2.0 -10657 MB / s2644 MB / s554581016.53nci db
snappy 1.1.3560 MB / s1452 MB / s614684418.32nci db
quicklz 1.5.0 -1512 MB / s799 MB / s616063618.36nci db
lzsse2 2016-05-14 -015 MB / s2984 MB / s633980718.89nci db
yappy 2014-03-22 -0179 MB / s1941 MB / s896756226.73nci db

Test 5. Dll files from Open Office.org 1.01, exe
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
memcpy4054 MB / s4102 MB / s6152192100.00ooffice, exe
csc 2016-10-13 -19.91 MB / s29 MB / s230152337.41ooffice, exe
lzma 9.38 -013 MB / s31 MB / s284157846.19ooffice, exe
lzlib 1.7 -014 MB / s24 MB / s287948946.80ooffice, exe
xpack 2016-06-02 -160 MB / s342 MB / s313796051.01ooffice, exe
libdeflate 0.6 -169 MB / s286 MB / s318743451.81ooffice, exe
zlib 1.2.8 -140 MB / s151 MB / s329053253.49ooffice, exe
brotli 0.5.2 -0154 MB / s143 MB / s353961557.53ooffice, exe
zstd 1.1.3 -1166 MB / s487 MB / s357989958.19ooffice, exe
broo 1.04.93 MB / s412 MB / s375720661.07ooffice, exe
gipfeli 2016-07-13163 MB / s354 MB / s392227663.75ooffice, exe
lzsse2 2016-05-14 -015 MB / s1205 MB / s399509164.94ooffice, exe
quicklz 1.5.0 -1234 MB / s264 MB / s401385965.24ooffice, exe
yalz77 2015-09-19 -135 MB / s398 MB / s412557067.06ooffice, exe
yappy 2014-03-22 -082 MB / s1718 MB / s423568768.85ooffice, exe
snappy 1.1.3222 MB / s889 MB / s427115069.42ooffice, exe
lz4 1.7.5337 MB / s1671 MB / s433891870.53ooffice, exe
lz5 2.0 -10251 MB / s1997 MB / s437007071.03ooffice, exe

Test 6. Example MySQL database format from Open Source Database Benchmark, database
Algorithm NamePacking speedDecompression speedCompressed File Size, Bytes% of the originalFile Name Type
memcpy4095 MB / s4073 MB / s10085684100.00osdb, db
csc 2016-10-13 -110 MB / s38 MB / s331780032.90osdb, db
lzlib 1.7 -019 MB / s33 MB / s334596533.18osdb, db
xpack 2016-06-02 -168 MB/s475 MB/s375287137.21osdb, db
zstd 1.1.3 -1194 MB/s585 MB/s377056637.39osdb, db
libdeflate 0.6 -190 MB/s470 MB/s389680338.64osdb, db
brotli 0.5.2 -0214 MB/s224 MB/s391050238.77osdb, db
lzma 9.38 -015 MB/s38 MB/s398882339.55osdb, db
zlib 1.2.8 -156 MB/s211 MB/s407639140.42osdb, db
broo 1.05.40 MB/s474 MB/s414746541.12osdb, db
lzsse2 2016-05-14 -012 MB/s1724 MB/s449255144.54osdb, db
gipfeli 2016-07-13232 MB/s530 MB/s451751744.79osdb, db
yalz77 2015-09-19 -151 MB/s596 MB/s457019345.31osdb, db
lz4 1.7.5359 MB/s1629 MB/s525666652.12osdb, db
lz5 2.0 -10278 MB/s1842 MB/s528673952.42osdb, db
snappy 1.1.3303 MB/s1110 MB/s532932152.84osdb, db
quicklz 1.5.0 -1277 MB/s330 MB/s549644354.50osdb, db
yappy 2014-03-22 -070 MB/s1794 MB/s751573574.52osdb, db

Тест 7. Текст книги Chłopi, польского писателя Радислава Реймонта, Polish, PDF
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy4123 MB/s4120 MB/s6627202100.00reymont, pdf
csc 2016-10-13 -115 MB/s47 MB/s187232428.25reymont, pdf
lzma 9.38 -015 MB/s49 MB/s192195429.00reymont, pdf
lzlib 1.7 -022 MB/s37 MB/s208229731.42reymont, pdf
zstd 1.1.3 -1157 MB/s486 MB/s216738532.70reymont, pdf
libdeflate 0.6 -1100 MB/s512 MB/s220693233.30reymont, pdf
xpack 2016-06-02 -197 MB/s389 MB/s227971634.40reymont, pdf
broo 1.05.10 MB/s423 MB/s228901934.54reymont, pdf
brotli 0.5.2 -0212 MB/s226 MB/s236073235.62reymont, pdf
zlib 1.2.8 -159 MB/s213 MB/s237643035.86reymont, pdf
gipfeli 2016-07-13222 MB/s318 MB/s264491639.91reymont, pdf
quicklz 1.5.0 -1284 MB/s399 MB/s300382545.33reymont, pdf
yalz77 2015-09-19 -176 MB/s347 MB/s301708345.53reymont, pdf
lzsse2 2016-05-14 -016 MB/s1735 MB/s303939245.86reymont, pdf
yappy 2014-03-22 -0119 MB/s1252 MB/s316134447.70reymont, pdf
lz4 1.7.5303 MB/s1611 MB/s318138748.00reymont, pdf
lz5 2.0 -10265 MB/s1626 MB/s318490148.06reymont, pdf
snappy 1.1.3208 MB/s729 MB/s323378748.80reymont, pdf

Тест 8. Tar архив исходников Samba 2-2.3, src
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy4048 MB/s4033 MB/s21606400100.00samba, src
csc 2016-10-13 -117 MB/s60 MB/s440724120.40samba, src
lzlib 1.7 -026 MB/s46 MB/s517881923.97samba, src
lzma 9.38 -021 MB/s59 MB/s533893524.71samba, src
zstd 1.1.3 -1257 MB/s715 MB/s555063725.69samba, src
xpack 2016-06-02 -1107 MB/s568 MB/s566929526.24samba, src
libdeflate 0.6 -1113 MB/s615 MB/s592297327.41samba, src
brotli 0.5.2 -0304 MB/s285 MB/s608432728.16samba, src
broo 1.06.90 MB/s650 MB/s618604228.63samba, src
zlib 1.2.8 -173 MB/s276 MB/s632945529.29samba, src
gipfeli 2016-07-13323 MB/s426 MB/s681062331.52samba, src
yalz77 2015-09-19 -181 MB/s512 MB/s709889932.86samba, src
quicklz 1.5.0 -1366 MB/s497 MB/s730945233.83samba, src
lzsse2 2016-05-14 -014 MB/s2144 MB/s739573734.23samba, src
lz4 1.7.5486 MB/s2035 MB/s771683935.72samba, src
lz5 2.0 -10398 MB/s2246 MB/s792717836.69samba, src
snappy 1.1.3353 MB/s1089 MB/s800877437.07samba, src
yappy 2014-03-22 -0123 MB/s1769 MB/s918327342.50samba, src

Тест 9. Звездный каталог Смитсоновской астрофизической обсерватории, bin
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy4096 MB/s4114 MB/s7251944100.00sao, bin
lzma 9.38 -09.47 MB/s22 MB/s492352967.89sao, bin
lzlib 1.7 -010 MB/s16 MB/s500557369.02sao, bin
csc 2016-10-13 -15.69 MB/s17 MB/s508284670.09sao, bin
xpack 2016-06-02 -147 MB/s312 MB/s525960672.53sao, bin
libdeflate 0.6 -160 MB/s258 MB/s549426875.76sao, bin
zlib 1.2.8 -131 MB/s158 MB/s556777476.78sao, bin
brotli 0.5.2 -0130 MB/s120 MB/s601984183.01sao, bin
gipfeli 2016-07-13146 MB/s422 MB/s604336183.33sao, bin
broo 1.03.55 MB/s496 MB/s608611883.92sao, bin
yappy 2014-03-22 -068 MB/s1709 MB/s620175285.52sao, bin
zstd 1.1.3 -1145 MB/s483 MB/s625428286.24sao, bin
yalz77 2015-09-19 -126 MB/s576 MB/s629903086.86sao, bin
snappy 1.1.3212 MB/s969 MB/s643526688.74sao, bin
quicklz 1.5.0 -1229 MB/s222 MB/s649830189.61sao, bin
lzsse2 2016-05-14 -015 MB/s941 MB/s671054292.53sao, bin
lz4 1.7.5337 MB/s2161 MB/s679027393.63sao, bin
lz5 2.0 -10236 MB/s2501 MB/s679272093.67sao, bin

Тест 10. Американский словарь английского языка Уэбстера, html
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy3970 MB/s4008 MB/s41458703100.00webster, html
csc 2016-10-13 -113 MB/s44 MB/s1036015524.99webster, html
lzma 9.38 -017 MB/s47 MB/s1270487830.64webster, html
lzlib 1.7 -022 MB/s38 MB/s1272759630.70webster, html
zstd 1.1.3 -1169 MB/s531 MB/s1373828433.14webster, html
libdeflate 0.6 -199 MB/s524 MB/s1383919233.38webster, html
broo 1.05.42 MB/s266 MB/s1385419533.42webster, html
xpack 2016-06-02 -194 MB/s441 MB/s1400690733.79webster, html
brotli 0.5.2 -0187 MB/s207 MB/s1455900735.12webster, html
zlib 1.2.8 -160 MB/s211 MB/s1499124236.16webster, html
gipfeli 2016-07-13209 MB/s281 MB/s1615231238.96webster, html
lzsse2 2016-05-14 -014 MB/s1897 MB/s1745951742.11webster, html
quicklz 1.5.0 -1276 MB/s369 MB/s1831581644.18webster, html
yalz77 2015-09-19 -162 MB/s315 MB/s1843524844.47webster, html
yappy 2014-03-22 -0107 MB/s1378 MB/s1989961048.00webster, html
lz4 1.7.5317 MB/s1593 MB/s2013998848.58webster, html
lz5 2.0 -10260 MB/s1790 MB/s2015354748.61webster, html
snappy 1.1.3214 MB/s765 MB/s2020646648.74webster, html

Тест 11. Коллекция xml файлов, xml
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy4118 MB/s4113 MB/s5345280100.00xml
csc 2016-10-13 -127 MB/s99 MB/s60676311.35xml
lzma 9.38 -034 MB/s108 MB/s69123612.93xml
zstd 1.1.3 -1363 MB/s887 MB/s70315113.15xml
lzlib 1.7 -039 MB/s73 MB/s74153713.87xml
broo 1.07.91 MB/s1277 MB/s80052614.98xml
brotli 0.5.2 -0409 MB/s451 MB/s90575716.94xml
libdeflate 0.6 -1143 MB/s856 MB/s94040917.59xml
zlib 1.2.8 -1104 MB/s344 MB/s96524818.06xml
xpack 2016-06-02 -1137 MB/s634 MB/s100000818.71xml
yalz77 2015-09-19 -1157 MB/s666 MB/s106737819.97xml
gipfeli 2016-07-13406 MB/s527 MB/s110053620.59xml
quicklz 1.5.0 -1452 MB/s712 MB/s112470821.04xml
lzsse2 2016-05-14 -018 MB/s2870 MB/s120112522.47xml
lz4 1.7.5617 MB/s1991 MB/s122749522.96xml
lz5 2.0 -10524 MB/s2231 MB/s124009823.20xml
snappy 1.1.3414 MB/s1196 MB/s130837424.48xml
yappy 2014-03-22 -0155 MB/s1915 MB/s160545930.04xml

Тест 12. Рентген изображение, image
Имя алгоритмаСкорость упаковкиСкорость распаковкиРазмер сжатого файла, Байт% от оригиналаИмя файла, тип
memcpy4023 MB/s4106 MB/s8474240100.00x-ray, image
csc 2016-10-13 -116 MB/s21 MB/s404963047.79x-ray, image
lzlib 1.7 -09.85 MB/s18 MB/s507927459.94x-ray, image
lzma 9.38 -010 MB/s23 MB/s519889461.35x-ray, image
xpack 2016-06-02 -148 MB/s243 MB/s586336769.19x-ray, image
libdeflate 0.6 -163 MB/s267 MB/s599975070.80x-ray, image
zlib 1.2.8 -135 MB/s145 MB/s603393271.20x-ray, image
brotli 0.5.2 -0139 MB/s121 MB/s660052377.89x-ray, image
zstd 1.1.3 -1419 MB/s569 MB/s677228679.92x-ray, image
lzsse2 2016-05-14 -017 MB/s883 MB/s729287686.06x-ray, image
quicklz 1.5.0 -1264 MB/s219 MB/s744063287.80x-ray, image
gipfeli 2016-07-13165 MB/s486 MB / s764139190.17x-ray image
broo 1.03.47 MB ​​/ s487 MB / s770271590.90x-ray image
yalz77 2015-09-19 -123 MB / s491 MB / s793365393.62x-ray image
snappy 1.1.3446 MB / s1869 MB / s820918096.87x-ray image
yappy 2014-03-22 -059 MB / s3200 MB / s832858298.28x-ray image
lz4 1.7.5852 MB / s3457 MB / s839019599.01x-ray image
lz5 2.0 -10540 MB / s4126 MB / s845968599.83x-ray image

“Afterword”


It should be noted that it is still damp and there is still enough work to improve, but it is already having a positive effect. There are still hours to generate and test hypotheses, write tests and optimize code.

Thanks for attention.

Also popular now: