Google Cloud Run Adds NVIDIA GPU Support for AI Inference

BigGo Editorial Team
Google Cloud Run Adds NVIDIA GPU Support for AI Inference

Google Cloud has announced a significant upgrade to its serverless platform Cloud Run, introducing support for NVIDIA L4 GPUs. This new feature, currently in preview, enables developers to run AI inference applications directly on Google's scalable cloud infrastructure.

The integration of NVIDIA L4 GPUs into Cloud Run opens up new possibilities for AI developers and businesses looking to deploy machine learning models efficiently. Here are some key highlights of this update:

  • Enhanced AI Capabilities: Developers can now perform real-time inference using lightweight open models like Google's Gemma (2B/7B) and Meta's Llama 3 (8B) for applications such as chatbots and document summarization.

  • Improved Performance: With 24GB of vRAM, the L4 GPUs can handle models with up to 9 billion parameters, offering fast token rates for popular models like Llama 3.1 (8B), Mistral (7B), and Gemma 2 (9B).

  • Cost Optimization: Cloud Run's ability to scale to zero when not in use helps optimize costs for AI inference workloads.

  • Simplified Deployment: The serverless nature of Cloud Run eliminates the need for infrastructure management, making it easier for developers to focus on their AI applications.

  • Versatility: Beyond AI inference, the GPU support extends to other compute-intensive tasks like image recognition, video transcoding, and 3D rendering.

Initially, Cloud Run GPU support is available in the us-central1 (Iowa) region, with plans to expand to Europe and Asia by the end of the year. Developers can attach one NVIDIA L4 GPU per Cloud Run instance without the need for advance reservations.

This update represents a significant step forward in making AI inference more accessible and cost-effective for businesses of all sizes. By combining the simplicity of serverless architecture with the power of NVIDIA GPUs, Google Cloud is positioning itself as a strong contender in the rapidly evolving AI infrastructure market.

Developers interested in trying out Cloud Run with NVIDIA GPUs can sign up for the preview program at g.co/cloudrun/gpu.