ClickHouse has successfully optimized its database engine to run efficiently on Intel's latest ultra-high core count processors, including the 288-core Sierra Forest chips. This achievement addresses one of the biggest challenges in modern computing: making software scale effectively across hundreds of CPU cores without hitting performance bottlenecks.
The optimization work focused on solving critical scaling problems that emerge when databases try to use all available processing power on these massive processors. Traditional approaches often fail because they weren't designed for such extreme parallelism.
Memory Allocation Becomes the Main Bottleneck
The biggest challenge wasn't CPU power but memory management. When hundreds of cores try to allocate memory simultaneously, the memory allocator itself becomes a traffic jam. ClickHouse engineers discovered that standard memory allocation routines couldn't keep up with the demands of 200+ cores all requesting memory at the same time.
The solution involved implementing thread-local memory allocators that avoid the locking mechanisms that cause delays. Each processing thread gets its own memory pool, eliminating the need for threads to wait in line for memory access. This change alone improved performance by 80% and reduced thread delays by 90%.
Thread-local allocators: Memory management systems where each processing thread has its own dedicated memory pool, avoiding conflicts between threads.
Intel's Two-Track Processor Strategy Creates Trade-offs
Intel has split their server processors into two distinct categories. The high-core count Sierra Forest processors use efficiency cores (E-cores) that pack more cores into the same space but lack advanced features like AVX-512 instructions. Meanwhile, their Granite Rapids processors use performance cores (P-cores) with full AVX-512 support but fewer total cores.
This creates an interesting choice for database workloads. For tasks that involve lots of data loading and basic processing, the 288 E-cores can outperform fewer but more powerful P-cores because they spend less time waiting for data. However, for computation-heavy tasks, the P-cores with their advanced instruction sets still win.
Real-World Performance Gains Surprise Even Experts
The community response has been overwhelmingly positive, with users reporting impressive results on multi-terabyte datasets. One user described loading several terabytes of financial market data and being able to aggregate through billions of records in just minutes, without even changing default system settings.
It's amazing how you get both the benefit of small size and quick querying, with minimal tweaks. I don't think I changed any system level defaults, yet I can aggregate through the entire few billion snapshots in a few minutes.
The optimizations also benefit other applications beyond databases. Researchers working on DNA sequencing and other data-intensive tasks are finding that these ultra-high core processors can compete with specialized GPU hardware for certain workloads, once the software is properly optimized for the hardware architecture.
![]() |
|---|
| ClickHouse team celebrates their successful optimization for Intel's ultra-high core processors, showcasing impressive real-world performance improvements |
Looking Forward to Mainstream Adoption
While 288-core processors might sound like overkill for most users, the optimization techniques developed for these extreme systems often benefit smaller configurations too. The memory allocation improvements and NUMA-aware processing help performance even on desktop systems with 16 or 32 cores.
The success of ClickHouse on these processors demonstrates that software can indeed scale to match hardware advances, but it requires fundamental rethinking of how applications manage resources. As these high-core count processors become more common in data centers, expect to see more software following similar optimization paths.
NUMA (Non-Uniform Memory Access): A computer architecture where different cores have faster access to some memory locations than others, requiring careful optimization for best performance.
Reference: Optimizing ClickHouse for Intel’s ultra-high core event processors

