DeepSeek's 3FS File System Achieves 6.6 TiB/s Throughput, Optimized for AI Workloads

BigGo Editorial Team

DeepSeek's 3FS File System Achieves 6.6 TiB/s Throughput, Optimized for AI Workloads

DeepSeek has released their Fire-Flyer File System (3FS), a high-performance distributed file system specifically designed for AI training and inference workloads. The system has been in development since 2019, originally created for high-frequency trading applications, and has now been optimized for the unique data access patterns of large-scale AI training.

Exceptional Performance for Random Read Workloads

3FS achieves remarkable performance, with benchmark tests showing read throughput of approximately 6.6 TiB/s across a cluster of 180 storage nodes. This performance level significantly outpaces traditional distributed file systems like Ceph, which recently celebrated reaching 1 TiB/s. The system is specifically engineered for the random read patterns common in AI training workloads, where traditional caching mechanisms provide little benefit.

For those who are interested, the design was originally published here... This file system has been developed and utilized by them for several years. Compared to the traditional file systems, it is more focused on model training that contains a lot of random reads. Read cache and prefetching are useless in this case.

What makes 3FS unique is its deliberate omission of read caching and prefetching—features that are staples in conventional file systems but offer no advantage for AI training workloads where data is rarely reused in the short term. Instead, 3FS uses Linux-based AIO and io_uring interfaces with Direct I/O mode, bypassing the file cache entirely to prevent unnecessary memory consumption.


Performance metrics showing the exceptional random read capabilities of the Fire-Flyer File System

Architecture and Technical Implementation

The system employs a disaggregated architecture that combines the throughput of thousands of SSDs with the network bandwidth of hundreds of storage nodes. It implements Chain Replication with Apportioned Queries (CRAQ) for strong consistency and uses stateless metadata services backed by a transactional key-value store.

While 3FS uses FUSE for metadata management, achieving high performance requires applications to link directly to the C++ client library for reads and writes. This design choice has sparked some discussion in the community about whether this limits its general-purpose utility, though Python bindings are available to improve accessibility.

The benchmark cluster that achieved the 6.6 TiB/s throughput consisted of 180 storage nodes, each equipped with 2x200Gbps InfiniBand NICs and sixteen 14TiB NVMe SSDs, with approximately 500+ client nodes for the read stress test. This configuration demonstrates the system's ability to scale effectively across large clusters.


Server throughput metrics illustrating the effective scaling and performance of the disaggregated architecture in 3FS

Positioning Among Competitors

Community discussions highlight that 3FS enters a field dominated by established solutions like Lustre and newer options like Weka for high-performance distributed storage. Traditional object storage systems like MinIO, Ceph, and SeaweedFS are generally considered too slow for the extreme throughput demands of large-scale AI training.

Lustre remains the big daddy of distributed parallel filesystems but is notoriously difficult to set up and operate. 3FS aims to provide comparable or better performance with a more modern, manageable architecture. The system's 6.6 TiB/s throughput significantly outpaces Ceph's recently celebrated milestone of 1 TiB/s, achieved with a smaller cluster of 68 nodes.

Beyond Training: KVCache for Inference

Beyond training data access, 3FS also offers KVCache functionality, which optimizes LLM inference by caching key and value vectors from previous tokens in decoder layers. This feature helps avoid redundant computations during inference, with benchmark results showing peak read throughput reaching up to 40 GiB/s.

This capability appears to be part of DeepSeek's strategy for cost-effective inference services, potentially explaining how they can offer competitive pricing on prompt cache hits.

The release of 3FS adds to DeepSeek's growing portfolio of infrastructure tools, following their recent publication of other components of their AI stack. As one commenter noted, the company's background in high-frequency trading, where performance is measured in nanoseconds rather than milliseconds, has likely influenced their approach to building high-performance AI infrastructure.

For organizations struggling with the high costs and performance limitations of existing solutions like AWS EFS, 3FS may represent a promising alternative, though its specialized nature means it's best suited for specific AI workloads rather than general-purpose storage needs.

Reference: Fire-Flyer File System