Improve Price Performance for LLM Serving with vLLM on TPU & GKE
Dive into a hands-on workshop designed exclusively for AI developers. Learn to leverage the power of Google Cloud TPUs, the custom accelerators behind Google Gemini, for highly efficient LLM inference using vLLM. In this workshop, you will build and deploy Gemma 3 27B on Trillium TPUs with vLLM and Google Kubernetes Engine (GKE). Explore advanced tooling like Dynamic Workload Scheduler (DWS) for TPU provisioning, Google Cloud Storage (GCS) for model checkpoints, and essential observability and monitoring solutionsLocation: Room 207Duration: 1 hour
