â–¶Learn llm-d
Explore our video collection to learn about llm-d's capabilities, architecture, and best practices for deploying LLM inference at scale.
Kubernetes Native Distributed Inferencing
Introduction to llm-d at DevConf.US 2025 — learn the fundamentals of distributed LLM inference on Kubernetes from Rob Shaw (Red Hat).
Serving PyTorch LLMs at Scale
Disaggregated inference with Kubernetes and llm-d — presented by Maroon Ayoub (IBM) & Cong Liu (Google) at PyTorch Conference.
Distributed Inference with Well-Lit Paths
Watch Rob Shaw (Red Hat) explore llm-d's "well-lit paths" and its approach to simplified, production-ready distributed inference.
Multi-Accelerator LLM Inference
Deep dive into multi-accelerator LLM inference on Kubernetes — presented by Erwan Gallen (Red Hat) at KubeCon.
Routing Stateful AI Workloads in Kubernetes
Maroon Ayoub (IBM) & Michey Mehta (Red Hat) explore cache-aware routing strategies for LLM workloads using llm-d and the K8s Gateway API Inference Extension.
Ready to get started?
Dive into our documentation or join our community to learn more.