Skip to main content

🎉 llm-d 0.8 is here! Multimodal, batch & flow-control graduate to production, with broader accelerator support and initial RL. See what's new →

Docs Blog Contributing

v0.8 (latest)
v0.7
dev

All posts

2026

Networking for Distributed Inference in llm-d
Serving Hybrid Models at Scale in llm-d
Heterogeneous inference serving across three GPU vendors with llm-d
BLIS: Evolving llm-d at Simulation Speed
No Kubernetes? No Problem: llm-d Now Runs Anywhere
llm-d v0.7: From Feature Introduction to Production Hardening
Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM
Predicted-Latency Based Scheduling for LLMs
Native KV Cache Offloading to Any Filesystem with llm-d
llm-d 0.5: Sustaining Performance at Scale

2025

llm-d 0.4: Achieve SOTA Performance Across Accelerators
llm-d 0.3: Wider Well-Lit Paths for Scalable Inference
KV-Cache Wins You Can See: From Prefix Caching in vLLM to Distributed Scheduling with llm-d
Intelligent Inference Scheduling with llm-d
llm-d 0.2: Our first well-lit paths (mind the tree roots!)
llm-d Community Update - June 2025
llm-d Week 1 Project News Round-Up
Announcing the llm-d community!
llm-d Press Release

Tags

A

Announcements5

B

blog posts10

C

Community1

H

Hello1
HMA1

I

Inference4

K

KV Cache3

L

llm-d release news9

N

Networking1
News Releases2
NIXL1

R

Releases5

S

Scheduling4
SIG-Benchmarking2
Storage2

U

UCCL1
UCX1
Updates3

W

Welcome!1

Documentation

Getting Started
Architecture
Well-Lit Paths

Community

Contact us
Contributing
Code of Conduct

More

Blog
Privacy Policy

Social

llm-d is a CNCF Sandbox project

Join our Slack

Copyright llm-d a Series of LF Projects, LLC. Apache 2.0 License.
We are a Cloud Native Computing Foundation sandbox project.
For website terms of use, trademark policy and other project policies please see https://lfprojects.org/policies/