llm-d User Guide

The user guide is organized in sections to help you get started with llm-d and then tailor the configuration to your resources and application needs. It is currently focused on the Quick Start via the llmd-deployer Helm chart.

What is llm-d?

llm-d is an open source project providing distributed inferencing for GenAI runtimes on any Kubernetes cluster. Its highly performant, scalable architecture helps reduce costs through a spectrum of hardware efficiency improvements. The project prioritizes ease of deployment+use as well as SRE needs + day 2 operations associated with running large GPU clusters.

For more information check out the Architecture Documentation

Installation: Start here to minimize your frustration

This guide will walk you through the steps to install and deploy the llm-d quickstart demo on a Kubernetes cluster.

Prerequisites Make sure your compute resources and system configuration are ready
Quick Start If your resources are ready, "kick the tires" with our Quick Start!

Installation: Start here to minimize your frustration​

Installation: Start here to minimize your frustration