llm-d components

The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.

Latest Release: v0.3.0

Released: October 10, 2025

Components

Component	Description	Repository	Documentation
Inference Scheduler	This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.	llm-d/llm-d-inference-scheduler	View Docs
Modelservice	`modelservice` is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).	llm-d-incubation/llm-d-modelservice	View Docs
Routing Sidecar	A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.	llm-d/llm-d-routing-sidecar	View Docs
Inference Sim	A light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.	llm-d/llm-d-inference-sim	View Docs
Infra	A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.	llm-d-incubation/llm-d-infra	View Docs
Kv Cache Manager	This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.	llm-d/llm-d-kv-cache-manager	View Docs
Benchmark	This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.	llm-d/llm-d-benchmark	View Docs

Getting Started

Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.

Quick Links

Main llm-d Repository - Core platform and orchestration
llm-d-incubation Organization - Experimental and supporting components
Latest Release - v0.3.0
All Releases - Complete release history

Previous Releases

For information about previous versions and their features, visit the GitHub Releases page.

Contributing

To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.

Latest Release: v0.3.0​

Components​

Getting Started​

Quick Links​

Previous Releases​

Contributing​