Skip to main content

llm-d components

The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.

Latest Release: v0.2.0​

Released: July 29, 2024

Auto-Generated Content

This page is automatically updated from the latest component repository information and release data. Last updated: 2025-08-01

Components​

ComponentDescriptionRepositoryDocumentation
Inference SchedulervLLM-optimized inference scheduler with smart load balancingllm-d/llm-d-inference-schedulerView Docs
ModelserviceHelm chart for declarative LLM deployment managementllm-d-incubation/llm-d-modelserviceView Docs
Routing SidecarReverse proxy for prefill and decode worker routingllm-d/llm-d-routing-sidecarView Docs
Inference SimLightweight vLLM simulator for testing and developmentllm-d/llm-d-inference-simView Docs
InfraExamples, Helm charts, and release assets for llm-d infrastructurellm-d-incubation/llm-d-infraView Docs
Kv Cache ManagerPluggable service for KV-Cache aware routing and cross-node coordinationllm-d/llm-d-kv-cache-managerView Docs
BenchmarkAutomated workflow for benchmarking LLM inference performancellm-d/llm-d-benchmarkView Docs

Getting Started​

Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.

Contributing​

To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.


This page is automatically updated from the component configurations and stays up to date with the latest release information.