llm-d components
The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.
Latest Release: v0.2.0​
Released: July 29, 2024
This page is automatically updated from the latest component repository information and release data. Last updated: 2025-08-01
Components​
Component | Description | Repository | Documentation |
---|---|---|---|
Inference Scheduler | vLLM-optimized inference scheduler with smart load balancing | llm-d/llm-d-inference-scheduler | View Docs |
Modelservice | Helm chart for declarative LLM deployment management | llm-d-incubation/llm-d-modelservice | View Docs |
Routing Sidecar | Reverse proxy for prefill and decode worker routing | llm-d/llm-d-routing-sidecar | View Docs |
Inference Sim | Lightweight vLLM simulator for testing and development | llm-d/llm-d-inference-sim | View Docs |
Infra | Examples, Helm charts, and release assets for llm-d infrastructure | llm-d-incubation/llm-d-infra | View Docs |
Kv Cache Manager | Pluggable service for KV-Cache aware routing and cross-node coordination | llm-d/llm-d-kv-cache-manager | View Docs |
Benchmark | Automated workflow for benchmarking LLM inference performance | llm-d/llm-d-benchmark | View Docs |
Getting Started​
Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.
Quick Links​
- Main llm-d Repository - Core platform and orchestration
- llm-d-incubation Organization - Experimental and supporting components
- Latest Release - llm-d v0.2.0
- All Releases - Complete release history
Contributing​
To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.
This page is automatically updated from the component configurations and stays up to date with the latest release information.