In the specialized world of hash functions, a new contender called rapidhash has emerged as potentially the new state-of-the-art solution for small-key hashing. According to community experts, rapidhash represents a significant advancement in the ongoing quest to balance speed, quality, and platform compatibility in non-cryptographic hash functions.
The Need for Speed vs. Quality
Hash functions serve as fundamental building blocks in computing, used in everything from hash tables and dictionaries to checksums and data verification. The ideal hash function distributes its inputs uniformly across its output space while processing data as quickly as possible. However, there has long been a fundamental tradeoff between throughput, latency, and quality.
There is a fundamental tradeoff between throughput and latency for hash functions. The rapidhash algorithm is clearly optimized for low latency in cases where the keys are small, like string dictionaries and similar.
What makes rapidhash particularly notable is its exceptional performance with small keys - strings typically under 100 bytes - while maintaining high-quality distribution characteristics. This makes it especially valuable for hash map implementations, where the overhead of calling the hash function itself becomes significant when processing many small strings.
Beyond XXH3: Quality Matters
While XXH3 has been a popular choice for years, community experts point out that it fails approximately 15% of the tests in SMHasher3, a comprehensive test suite for evaluating hash function quality. Rapidhash, by comparison, passes all tests in both SMHasher and SMHasher3 while delivering superior performance.
The quality of a hash function refers to how closely it approximates a random oracle - essentially, how uniformly it distributes any given set of inputs across its output space. While cryptographic hash functions like SHA-256 excel at this, they're typically much slower. Non-cryptographic functions like rapidhash aim to find the optimal balance between speed and quality.
Performance benchmarks show rapidhash achieving impressive results, with average latency when hashing small keys (4-16 bytes) ranging from 1.38ns on Apple M3 Pro to 2.31ns on AMD Turin processors. For larger inputs, rapidhash reaches throughput of up to 71GB/s on Apple's M4 chips, significantly outpacing XXH3's 49GB/s on the same hardware.
Performance Comparison: Average Latency (Hashing 4-16 byte keys)
Hash Function | M1 Pro | M3 Pro | Neoverse V2 | AMD Turin |
---|---|---|---|---|
rapidhash | 1.79ns | 1.38ns | 2.07ns | 2.31ns |
xxh3 | 1.92ns | 1.50ns | 2.15ns | 2.35ns |
Peak Throughput (Hashing 16KB-2MB files)
Hash Function | M1 Pro | M3 Pro | M3 Ultra | M4 | Neoverse V2 |
---|---|---|---|---|---|
rapidhash | 47GB/s | 57GB/s | 61GB/s | 71GB/s | 37GB/s |
xxh3 | 37GB/s | 43GB/s | 47GB/s | 49GB/s | 34GB/s |
Specialized Use Cases
The discussion reveals that hash function selection should be tailored to specific use cases. For dictionary lookups and hash tables with small keys, rapidhash appears to be the current leader. However, for specialized applications where key properties are well-known in advance, custom-designed hash functions might still offer better performance.
The crossover point between latency-optimized algorithms like rapidhash and throughput-optimized alternatives occurs around 400-500 bytes on modern server hardware. For keys larger than this threshold, other solutions might be more appropriate.
It's worth noting that the field of non-cryptographic hash functions has evolved rapidly in recent years. Functions considered state-of-the-art a decade ago are now considered broken by today's standards. This rapid advancement has raised the bar significantly for what constitutes an acceptable general-purpose hash function.
For developers working on performance-critical applications involving hash tables or dictionaries, rapidhash represents a compelling option that balances code size, speed, and quality. Its ability to process small keys with minimal latency while maintaining high-quality distribution characteristics makes it particularly valuable for modern software development.
Reference: rapidhash - Very fast, high quality, platform-independent