Skip to main content

llm-d: a high-performance and scalable distributed LLM inference framework

llm-d is a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.