Tenstorrent Faces Harsh Criticism Over Complex Software Stack and Developer Experience

BigGo Editorial Team

Tenstorrent Faces Harsh Criticism Over Complex Software Stack and Developer Experience

George Hotz, founder of Comma.ai and creator of the tinygrad machine learning framework, has published a scathing critique of Tenstorrent's software approach. His comments highlight growing concerns about the AI chip company's developer experience and overly complex abstraction layers.

Tenstorrent, led by legendary chip architect Jim Keller, has been positioning itself as a competitor to NVIDIA in the AI compute space. The company promises more programmable hardware compared to traditional GPUs, but Hotz argues they're failing to expose this key advantage to developers.

Developer Frustration with Multiple Abstraction Layers

The core criticism centers on Tenstorrent's software stack having too many abstraction layers. Hotz specifically calls out their Low Level Kernel (LLK) approach as fundamentally flawed, comparing it to building a castle on a shit swamp. He advocates for a simpler three-layer approach: frontend, compiler, and runtime/driver.

Community feedback supports these concerns. Experienced developers who should be ideal early adopters report struggling to make progress with Tenstorrent's tools. One PhD student in machine learning with extensive systems programming experience described being unable to make heads or tails out of all their various abstractions despite reading documentation and attending meetups.

Another developer attempted to run a recent Vision Language Model on Tenstorrent's Blackhole hardware over a weekend but made little progress, getting stuck on unsupported operations that span multiple parts of the software stack.

Recommended Software Stack Structure:

Current Tenstorrent approach: 7+ abstraction layers including LLK (Low Level Kernel)
Proposed simplified approach: 3 layers only
1. Frontend (PyTorch, ONNX, tensor.py)
2. Compiler (memory placement, op scheduling, kernel fusion)
3. Runtime/Driver (hardware exposure, compilation, dispatch, queuing)

The ELU Problem as a Symbol of Deeper Issues

Hotz uses the Exponential Linear Unit (ELU) activation function as an example of misplaced complexity. He argues that basic functions like ELU shouldn't require special implementation at low levels of the stack. Instead, they should be composed from simpler operations like ReLU and exponential functions.

This reflects a broader organizational problem where brilliant engineers may be obsessively tuning for their own use cases without considering the broader developer experience. The result is a system that works for internal teams but creates barriers for external developers.

Key Technical Issues Identified:

ELU Implementation: Should be composed as self.relu() - alpha*(1-self.exp()).relu() rather than hardcoded at low levels
Abstraction Problems: LLK (Low Level Kernel) sitting under tt-metalium prevents proper hardware exposure
Developer Barriers: Complex multi-layer abstractions make it difficult for external developers to implement models like Mixtral or Pixtral

The NVIDIA Advantage and Path Forward

The criticism comes at a crucial time for Tenstorrent. As Hotz points out, the company can't compete on manufacturing deals or intellectual property licensing against established players like NVIDIA and AMD. Their only sustainable advantage lies in exposing the programmability of their hardware.

There is no product leadership on the API design. Just a lot of really brilliant engineers obsessively tuning for their own usecases, unwilling to ever trade-off a hit in performance or expressivity for readability or writeability.

The community discussion reveals that successful AI hardware platforms require more than just technical excellence. They need strong product leadership focused on developer experience and the discipline to maintain clean abstractions even when it means sacrificing some performance or internal convenience.

Conclusion

While Tenstorrent's hardware capabilities may be impressive, the developer community's struggles suggest the company needs to fundamentally rethink its software approach. The criticism from experienced developers who should be natural advocates indicates that technical brilliance alone isn't enough to challenge NVIDIA's dominance in AI compute.

The path forward likely requires difficult decisions about simplifying the software stack, even if it means short-term performance trade-offs. Without addressing these developer experience issues, Tenstorrent risks becoming another promising AI chip company that fails to gain meaningful market traction.

Reference: tt-tiny