Microsoft's WHAMM AI Model Runs Quake II in Browser with Real-Time Generative Graphics

BigGo Editorial Team
Microsoft's WHAMM AI Model Runs Quake II in Browser with Real-Time Generative Graphics

Microsoft has taken a bold step in the realm of AI-powered gaming with its latest experimental model that transforms how classic games might be rendered in the future. The tech giant's new approach demonstrates both the potential and current limitations of generative AI in interactive entertainment.

WHAMM: Microsoft's New AI Gaming Model

Microsoft recently unveiled WHAMM (World and Human Action MaskGIT Model), a generative AI system designed specifically for real-time gaming applications. This new model represents a significant advancement over its predecessor, WHAM-1.6B, which was released in February. The most impressive demonstration of WHAMM's capabilities comes in the form of a playable version of the 28-year-old classic game Quake II, which users can experience directly in their web browsers through Copilot Labs. While the technology is still experimental, it showcases how AI might eventually transform gaming experiences by generating visual content in real-time based on player interactions.

WHAMM AI interface for real-time game generation in Quake II
WHAMM AI interface for real-time game generation in Quake II

Technical Innovations Behind WHAMM

The key technical innovation in WHAMM lies in its departure from traditional autoregressive models, which generate tokens sequentially. Instead, WHAMM employs a MaskGIT-style architecture that can generate all image tokens for a frame in parallel. This architectural shift significantly reduces the number of forward passes required and decreases dependencies between elements, enabling faster visual output that approaches real-time responsiveness. The resolution has also been improved from the previous model's 300 x 180 pixels to a more detailed 640 x 360 pixels, providing clearer visuals while maintaining the same underlying encoder-decoder architecture.

Technical architecture of the WHAM model showcasing its innovative design
Technical architecture of the WHAM model showcasing its innovative design

Accelerated Training Process

Perhaps most remarkable is the dramatic reduction in training time required for WHAMM. While the previous WHAM-1.6B model needed seven years of gameplay data for training, developers taught WHAMM using just over a week of curated Quake II gameplay. This efficiency was achieved by leveraging data from professional game testers who focused exclusively on a single level of the game. This represents a significant advancement in AI model training efficiency, potentially making similar systems more practical to develop in the future.

Current Limitations and User Experience

Despite these advancements, WHAMM remains firmly in the experimental stage. The demo runs at extremely low frame rates, barely reaching the low to mid-teens, and suffers from noticeable input lag. Microsoft emphasizes that the demo should be viewed as a technological showcase rather than a finished gaming product. Players can perform basic actions like shooting, jumping, crouching, and interacting with enemies, but the experience is hampered by numerous limitations. Enemy interactions appear fuzzy, health-tracking and damage statistics are often incorrect, and the model has a limited context length—forgetting objects that leave the player's view for longer than nine-tenths of a second. Additionally, the demo is confined to a single level, as attempting to progress beyond freezes the image generation due to lack of recorded training data.

AI in Creative Industries: Enhancement vs. Replacement

WHAMM emerges amid broader discussions about AI's role in creative industries. Recent controversies, such as OpenAI's Ghibli-inspired AI creations, have highlighted public skepticism about whether AI can truly replicate human artistry. Microsoft positions WHAMM not as a replacement for human creativity but as a tool to augment it—a philosophy similar to Nvidia's ACE technology, which enhances lifelike NPCs in games like inZOI. The ideal implementation would see AI enhancing rather than replacing creative works, adding dynamic elements while preserving the human touch that makes games compelling.

Future Implications for Interactive Media

Looking ahead, Microsoft envisions WHAMM and similar technologies enabling entirely new forms of interactive media. While fully AI-generated games remain on the horizon rather than an immediate reality, innovations like WHAMM suggest they could emerge within the next few years. Future iterations will likely address current shortcomings while empowering game developers to craft more immersive narratives enriched by AI-driven tools. The technology represents an intriguing glimpse into how generative AI might eventually transform not just how games look, but how they fundamentally function and respond to player actions.