Skip to main content

llm-d: a Kubernetes-native high-performance distributed LLM inference framework

llm-d is a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.