LLM Serving Framework for Heterogeneous Accelerator Data Centers
As AI data centers evolve to serve diverse models and workload patterns, the era of “one GPU does everything” is ending. Achieving optimal tokens per dollar now depends on effectively combining heterogeneous accelerators across vendors. However, building a unified inference environment on such infrastructure introduces significant challenges in software compatibility and hardware-level optimization. Moreh addresses this with the MoAI Inference Framework, a distributed inference software for heterogeneous accelerator clusters. It removes the operational complexity of managing NVIDIA GPUs, AMD GPUs, and various AI chips individually in neocloud and enterprise environments, while enabling cross-vendor disaggregated serving to maximize infrastructure efficiency.
Speakers
