▶Learn llm-d

Explore our video collection to learn about llm-d's capabilities, architecture, and best practices for deploying LLM inference at scale.

Kubernetes Native Distributed Inferencing

Introduction to llm-d at DevConf.US 2025 — learn the fundamentals of distributed LLM inference on Kubernetes from Rob Shaw (Red Hat).

Serving PyTorch LLMs at Scale

Disaggregated inference with Kubernetes and llm-d — presented by Maroon Ayoub (IBM) & Cong Liu (Google) at PyTorch Conference.

Distributed Inference with Well-Lit Paths

Watch Rob Shaw (Red Hat) explore llm-d's "well-lit paths" and its approach to simplified, production-ready distributed inference.

Multi-Accelerator LLM Inference

Deep dive into multi-accelerator LLM inference on Kubernetes — presented by Erwan Gallen (Red Hat) at KubeCon.

Routing Stateful AI Workloads in Kubernetes

Maroon Ayoub (IBM) & Michey Mehta (Red Hat) explore cache-aware routing strategies for LLM workloads using llm-d and the K8s Gateway API Inference Extension.

Ready to get started?

Dive into our documentation or join our community to learn more.

Read the Docs Join Slack