XXH3: a new hash speed record holder


    Benchmarks are made in the SMHasher program on Core 2 Duo 3.0 GHz

    . Habré has repeatedly talked about non-cryptographic hash functions , which are an order of magnitude faster than cryptographic ones. They are used where speed is important and it makes no sense to use slow MD5 or SHA1. For example, to build hash tables with storage of key-value pairs or to quickly check the checksum when transferring large files.

    One of the most popular is the xxHash hash familythat appeared about five years ago. Although initially these hashes were conceived to verify the checksum during LZ4 compression, they began to be used on a variety of tasks. It’s understandable: just look at the table above with a comparison of the performance of xxHash and some other hash functions. In this test, xxHash outperforms its closest competitor in performance by half. The new version of XXH3 raises the bar even higher. Hereinafter, charts are clickable. Program author Yann Collet writes




    that the idea to improve the algorithm appeared during the implementation of the Bloom filter, which required the rapid generation of 64 pseudo-random bits based on small input data of variable length. In principle, the standard XXH64 could handle this, but processing small input data was never a priority in its development. In other words, optimization is possible. As a result of this optimization, a new hash function XXH3 appeared, in which ideas from some other hash algorithms are implemented. Most interestingly, XXH3 is significantly faster than all existing xxHash variants, not only on small input data, but in almost all use cases.

    XXH3 eliminates the main drawback of XXH64 - hash limitation of 64 bits. The author says that for this reason he was often asked to release at least a 128-bit version. So, the XXH3 hash function is theoretically capable of generating hashes of up to 256 bits.

    In XXH3, an inner loop that is optimally handled by vectorization. Due to this, the function uses hardware support on the instruction sets SSE2, AVX2 and NEON. Performance depends on the compiler. Unexpectedly, the version compiled by clang is far superior to the rest. Ian Colle even thought that the performance of the hash function here would exceed the memory bandwidth. This version on the graph corresponds to a dashed line.



    As a result, it turned out that the hash function with support for AVX2 has a much higher throughput than RAM, so the cache size becomes an important factor. However, in many tasks it is necessary to process data that is already in the cache, so working at a speed faster than memory makes perfect sense.

    Support for SSE2 instructions allows the new hash function to significantly circumvent XXH32 on 32-bit applications that are still common in the real world (for example, WASM bytecode). On small input data (20-30 bytes), the XXH3 hash function is not much, but still surpasses CityHash from Google, as well as FarmHash, which used to be clear leaders.







    The following graph shows the results of the most realistic test with input data of variable length (random variable from 1 to N). Another test takes into account the delay : here the hashing begins on a signal. The idea is to simulate a real workload, where the hash function does not work continuously, but is called in other processes at a certain moment. The author writes that this test with a multiplication of 64 × 64 => 128 bits prefers the mumv2 algorithms from Vladimir Makarov and t1ha2 from Leo Yuryev. Finally, here is the final and most important graph that shows the hash rate on the input data of variable length, taking into account the delay. It reflects the actual use of the hash function, for example, in databases.















    XXH3 was released as part of the xxHash v0.7.0 package . She has a note “experimental”, and for unlocking you need to apply a macro XXH_STATIC_LINKING_ONLY. The author explains that the hash function so far can only be used on test ephemeral data, but not for the actual storage of hashes. According to the results of the experimental period and the collection of user reviews, the XXH3 function will receive a stable status in the next version of xxHash. Signature

    certificates for Microsoft Office documents , Adobe PDF, LibreOffice, etc.
    GlobalSign provides ample opportunity to implement trusted digital signatures . From desktop, server to cloud implementations. More details

    Also popular now: