At the Hot Chips 2025 conference, Nvidia provided an in-depth look at its Blackwell Ultra architecture, revealing significant performance improvements and new technological capabilities that position the company's latest GPUs at the forefront of AI computing. The announcement comes as Nvidia continues to dominate the AI hardware market, despite recent earnings results showing mixed investor sentiment due to China trade restrictions.
Enhanced NVFP4 Performance with Optimized Tensor Cores
The Blackwell Ultra B300-series GPUs feature newly optimized Tensor cores specifically designed for Nvidia's proprietary NVFP4 data format. This enhancement delivers up to 50% more NVFP4 PetaFLOPS performance compared to the standard Blackwell B100/B200 series. However, this optimization comes with trade-offs, as the enhanced NVFP4 performance results in reduced INT8 and FP64 computational capabilities. The NVFP4 format represents a significant advancement in AI processing efficiency, utilizing a compact E2M1 layout with dual scaling approach that maintains accuracy close to BF16 while dramatically reducing memory requirements.
![]() |
|---|
| Comparative performance of NVFP4 against BF16 for AI tasks, showcasing enhancements in accuracy and efficiency |
Substantial Memory and Connectivity Upgrades
Blackwell Ultra GPUs now feature 288 GB of HBM3E memory, representing a significant increase from the 186 GB found in previous Blackwell implementations. This memory expansion enables larger batch sizes and longer sequences for AI workloads. Additionally, the B300 series becomes the first official data center GPU to support PCIe 6.0 interconnection, offering 128 GB/s bidirectional bandwidth per x16 slot through PAM4 signaling and FLIT-based encoding. Currently, only Nvidia's Grace CPUs support this PCIe 6.0 capability, creating a tightly integrated ecosystem.
Power Consumption Trade-offs for Performance Gains
The enhanced capabilities of Blackwell Ultra come at the cost of increased power consumption. The B300 series operates at a 1,400W TDP, representing a 200W increase over the 1,200W TDP of standard Blackwell processors. This power increase reflects the additional computational resources and memory capacity integrated into the Ultra architecture, highlighting the ongoing challenge of balancing performance with energy efficiency in high-performance computing applications.
Proprietary NVFP4 Format Drives Competitive Advantage
Nvidia's NVFP4 format extends beyond simple inference applications to support pretraining at trillion-token scales. Early experiments with 7-billion-parameter models trained on 200 billion tokens demonstrate results comparable to BF16 precision. The format achieves memory requirements approximately 1.8 times lower than FP8 and 3.5 times lower than FP16, significantly reducing storage and data-movement overhead across NVLink and NVSwitch fabrics. Despite being proprietary and limited to Nvidia hardware, the company is integrating NVFP4 support into open-source frameworks including Cutclass, NCCL, and TensorRT Model Optimizer.
![]() |
|---|
| Accuracy scores comparison showcasing NVFP4's advanced performance over FP8 across various model evaluations |
Market Position Amid China Trade Challenges
The Blackwell Ultra announcement coincides with Nvidia's recent earnings report, which showed strong performance but disappointed some investors due to zero H20 chip sales to China-based customers. Revenue reached USD 46.74 billion, exceeding Wall Street projections of USD 46.52 billion, with datacenter revenue growing 56% year-over-year to USD 41.1 billion. CEO Jensen Huang emphasized that Production of Blackwell Ultra is ramping at full speed, and demand is extraordinary, positioning the new architecture as central to the ongoing AI infrastructure race despite geopolitical constraints affecting certain markets.


