A recent article challenging the effectiveness of lockless queues has ignited a fierce debate in the programming community, with developers questioning both the performance claims and fundamental design assumptions presented by the author.
The original piece argued that most lockless queue implementations are fundamentally flawed and slow, proposing an alternative bag data structure that abandons ordering guarantees entirely. However, this assertion has met with significant pushback from experienced developers who point to real-world performance data that contradicts these claims.
Performance Numbers Tell a Different Story
Community members quickly challenged the lockless queues are slow assertion with concrete benchmarking data. Developers working with Rust's lockless queue ecosystem report that the fastest implementations can process over 30 million elements per second, translating to just 33 nanoseconds per element. Even slower implementations clock in around 100 nanoseconds per element, which many consider quite fast for concurrent operations.
The debate intensified when the original author defended their position by citing their own benchmark showing 250 nanoseconds average processing time with heavy contention. Critics argued this represents a specific implementation under extreme conditions rather than a general indictment of lockless queue technology.
Lockless Queue Performance Comparison
- Fastest implementations: 30+ million elements/second (33ns per element)
- Slower implementations: 5-10 million elements/second (100ns per element)
- Author's benchmark with heavy contention: ~4 million elements/second (250ns per element)
The Queue Definition Controversy
A significant portion of the discussion centered on what actually constitutes a queue in multi-threaded environments. The original article claimed that multi-consumer queues break global ordering and therefore aren't true queues. Community members strongly disagreed with this narrow definition, pointing out that real-world queues rarely maintain strict global processing order.
One developer used the analogy of a city hall with multiple service counters to illustrate the point. While people receive tickets in order, they may be served by different counters at different speeds, potentially completing service out of the original sequence. This mirrors how multi-consumer queues work in practice and serves legitimate use cases.
Implementation Challenges and Trade-offs
The proposed bag alternative drew skepticism from developers familiar with concurrent data structures. Critics noted that the bag implementation still requires atomic operations on bitmasks, which could create their own contention bottlenecks. The fundamental challenge remains: any shared data structure accessed by multiple threads must coordinate access somehow.
Several commenters highlighted that different queue variants serve different purposes. MPSC (multiple-producer, single-consumer) queues are widely used in task schedulers, while SPSC (single-producer, single-consumer) queues work well for custom processing pipelines. The choice depends on specific use case requirements rather than universal performance characteristics.
The fastest ones easily do at least 30 million elements/second in most configurations. The slowest around 5-10. So the fastest is doing 33ns per element and the slowest is 100ns.
Queue Types and Use Cases
- SPSC (Single-Producer Single-Consumer): Custom processing pipelines
- MPSC (Multiple-Producer Single-Consumer): Task schedulers (tokio, rayon)
- SPMC (Single-Producer Multiple-Consumer): Limited use cases
- MPMC (Multiple-Producer Multiple-Consumer): General task scheduling
Practical Implications for Developers
The debate reveals important considerations for developers choosing concurrent data structures. While the original article raised valid points about ordering guarantees, the community response emphasized that many applications don't require strict global ordering. Features like fairness, starvation avoidance, and predictable behavior often matter more than perfect sequential consistency.
The discussion also highlighted the danger of making broad performance claims without comprehensive benchmarking across different scenarios and implementations. What performs well under one set of conditions may struggle under others, making context crucial for design decisions.
The controversy ultimately underscores the complexity of concurrent programming and the importance of understanding both the theoretical properties and practical performance characteristics of different approaches before making architectural decisions.
Reference: Your MPSC/SPMC/MPMC queue is not a queue
