Apple Taps Google TPUs, Not Nvidia GPUs, to Train AI Models

BigGo Editorial Team

Apple Taps Google TPUs, Not Nvidia GPUs, to Train AI Models

Apple Leverages Google's TPUs for AI Model Training

In a surprising move, Apple has revealed it relied on Google's Tensor Processing Units (TPUs) rather than Nvidia's GPUs to train the artificial intelligence models powering its new Apple Intelligence features.


Apple's collaboration with Google on AI model training showcases the notable use of Google's TPUs

Key Points:

Apple used Google TPUv4 and TPUv5 chips to train its AI models
The company did not use Nvidia hardware, despite Nvidia's dominance in AI acceleration
Apple's largest model, AFM-server, was trained on 8,192 TPUv4 chips
A smaller 3B parameter on-device model was trained using 2,048 TPUv5p chips


Benchmark comparison of various AI models, highlighting Apple's AFM-server and AFM-on-device performance

Training Process

Apple's research paper outlines a multi-stage training process for its Apple Foundation Models (AFMs):

AFM-server (largest model):
- Trained on 8,192 TPUv4 chips in a distributed configuration
- Three-stage process using 6.3T tokens, 1T tokens, and 100B tokens for context lengthening
AFM-on-device (3B parameter model):
- Distilled from the 6.4B server model
- Trained on 2,048 TPUv5p chips

Data Sources

Apple's training data included:

Web content crawled by Applebot (respecting robots.txt)
Licensed high-quality datasets
Carefully selected code, math, and public datasets

Performance Claims

According to Apple's internal testing, both AFM-server and AFM-on-device models excel in various benchmarks, including:

Instruction Following
Tool Use
Writing

Industry Implications

This revelation is significant for several reasons:

Departure from Nvidia: Apple's choice to use Google TPUs instead of industry-standard Nvidia GPUs is notable.
Google's Hardware Strength: The decision could be seen as an acknowledgment of Google's TPU capabilities.
Competitive Landscape: Apple's transparency in releasing detailed information about its AI development process suggests a desire to be seen as a serious contender in the AI space.

As the AI race intensifies among tech giants, Apple's hardware choices and development strategies will be closely watched by industry observers and competitors alike.

For more detailed information on Apple's AI model training techniques and benchmarks, interested readers can refer to the full research paper released by the company.

Update: Tuesday July 30 20:42

Apple's use of Google TPUs for AI model training is part of a larger AI development strategy. The company is investing $5 billion in AI development over the next two years, with a focus on transitioning to its own hardware infrastructure for future AI processing. This includes Project ACDC, an initiative to use Apple Silicon-derived hardware in its data centers for AI tasks. Additionally, Apple is adopting a more transparent approach to AI development, including the release of open-source language models. The company has also clarified that while it used The Pile dataset to train its OpenELM models, these models do not power any commercial Apple AI products, including Apple Intelligence.


Architectural layout of Apple's Intelligence system, outlining its components and future direction in AI development