llm-d: a Kubernetes-native high-performance distributed LLM inference framework

llm-d is a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.

Try the Quickstart Demo

It's as easy as 1...2...llm-d!

Check the Prerequisites

Run the Quickstart

Explore llm-d!

Install Guides