From Training to Inference: Optimizing Diverse AI Workloads on AWS Trainium3
Dive into the journey of AI workloads on AWS Trainium3 — from distributed training to fine-tuning and deployment. This talk showcases how to optimize each stage using the Neuron SDK. We'll demonstrate the use of the Neuron Profiler to pinpoint bottlenecks and guide you through writing and integrating Neuron Kernel Interface (NKI) kernels to achieve peak performance. Explore real-world examples of how organizations are leveraging these techniques across LLMs, diffusion models, computer vision, and agentic inference to accelerate their AI workloads and reduce costs.
