Scaling AI Infrastructure: Strategies for Resilient Fleet Operations

10 Sep 2025
Hardware & Systems
What does it take to run one of the world's largest AI supercomputers? As artificial intelligence workloads grow exponentially, operating a hyperscale AI cloud fleet demands new strategies for resilience, efficiency, and operational excellence. This session explores Microsoft’s approach to scaling infrastructure for 100X growth, focusing on the intersection of system innovation and advanced fleet management.