I’m most excited about the shift toward programmable, telemetry-rich fabrics and the growing adoption of Open Compute standards across the AI infrastructure stack. These standards give ecosystem partners a shared blueprint for building interoperable, high-performance components and accelerate the emergence of an open, multi-vendor hardware ecosystem.
This evolution is particularly important as AI workloads move beyond training into multi-node inference and KV-cache disaggregated serving. These workloads introduce extremely tight latency, synchronization, and bandwidth requirements across memory tiers, storage and nodes. Running model execution, memory access, and KV-cache lookups over the network places unprecedented pressure on the underlying fabric: microsecond sensitivity, congestion hotspots, and tail-latency amplification all become critical bottlenecks.
The industry’s move toward heterogeneous, high-bandwidth, low-latency fabric architectures supported by CXL memory pooling, UALink/SUE/NVLink scale-up accelerator links, and RDMA Scale out networking is what will make these new inference architectures a.k.a Gen AI token factories viable at scale.
For the first time, we can run true software-driven control on top of these network fabrics to ensure consistency, performance, and reliability, even under dynamic multi-node inference loads.