A San Francisco-based AI research lab has demonstrated the dramatic cost savings possible with on-premises storage infrastructure, building a 30 petabyte storage cluster for just $35,000 USD monthly compared to Amazon Web Services' estimated $1.2 million USD monthly cost. The project, designed to store massive video datasets for computer vision model training, has sparked significant discussion in the tech community about the hidden costs and trade-offs of DIY datacenter operations.
![]() |
|---|
| A high-tech server rack showcasing the used enterprise hard drives for the massive storage cluster |
The Missing Labor Cost Factor
The most prominent concern raised by the community centers on operational expenses that weren't included in the cost comparison. While the startup calculated their total annual cost at $354,000 USD including depreciation, critics point out that San Francisco-based staff salaries for maintaining the infrastructure could easily double or triple the actual operational costs. This oversight highlights a common pitfall in cloud versus on-premises comparisons where labor costs are underestimated or ignored entirely.
Zero Redundancy Strategy Raises Eyebrows
The storage setup deliberately eliminates data redundancy to minimize costs, a decision that has divided community opinion. The approach works for their specific use case of storing training data that can be easily replaced, but many question its applicability for businesses requiring data integrity guarantees. The community noted that while this strategy makes sense for hoarding videos from YouTube, it wouldn't work for most organizations that need assurance their data is safe from hardware failures or disasters.
Used Hardware Gamble Pays Off
The team's decision to use 2,400 used enterprise hard drives worth $500,000 USD has generated considerable debate about reliability versus cost savings. Community members shared mixed experiences with used drives, noting high performance variability and questioning long-term maintenance costs. However, others argued that used drives can be cost-effective since they've already survived the early failure period that typically affects new hardware.
Used drives make sense if maintaining your home server is a hobby. It's fun to diagnose and solve problems in home servers, and failing drives give me a reason to work on the server.
The startup reports a conservative 5% annual disk failure rate, which translates to replacing about 120 drives yearly - a manageable number for their simple storage architecture built with just 200 lines of Rust code and an nginx web server.
Maintenance Reality Check
Community discussion revealed that the true test of this approach lies in ongoing operational overhead. While the startup benefits from having their datacenter just blocks away from their office, enabling quick debugging and maintenance visits, most organizations would need dedicated operations staff. Estimates suggest at least 5 hours weekly for maintenance tasks, which could significantly impact the total cost of ownership calculations.
The project has already reached capacity and the team is considering replicating the setup, suggesting their cost-benefit analysis has proven successful for their specific needs. However, the community consensus indicates that while impressive cost savings are possible with on-premises storage, the hidden operational complexities and labor costs make cloud solutions more practical for most organizations.
Reference: Building the heap: racking 30 petabytes of hard drives for pretraining

