
llm-d: a Kubernetes-native high-performance distributed LLM inference framework
llm-d is a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.