llm-d User Guide
The user guide is organized in sections to help you get started with llm-d and then tailor the configuration to your resources and application needs. It is currently focused on the Quick Start via the llmd-deployer Helm chart.
What is llm-d?
llm-d is an open source project providing distributed inferencing for GenAI runtimes on any Kubernetes cluster. Its highly performant, scalable architecture helps reduce costs through a spectrum of hardware efficiency improvements. The project prioritizes ease of deployment+use as well as SRE needs + day 2 operations associated with running large GPU clusters.
For more information check out the Architecture Documentation
Installation: Start here to minimize your frustration
This guide will walk you through the steps to install and deploy the llm-d quickstart demo on a Kubernetes cluster.
- Prerequisites Make sure your compute resources and system configuration are ready
- Quick Start If your resources are ready, "kick the tires" with our Quick Start!