llm-d-infra Quick Start

This document is meant to guide users through the process of using, deploying and potentially customizing a quickstart. The source of truth for installing a given quickstart will always live that particular directory, this guide is mean to walk through the common steps, and educate users on decisions that happen at each of those phases.

Overview

This guide will walk you through the steps to install and deploy llm-d on a Kubernetes cluster, using an opinionated flow in order to get up and running as quickly as possible.

Client Configuration

You will need to install some dependencies (like helm, yq, git, etc.) and have a HuggingFace token for most of the examples. We have documented those requirements and instructions on this in the dependencies directory.

Target Platforms

Since the llm-d-infra is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. Requirements, workarounds, and any other documentation relevant to these platforms will live in the infra-providers directory.

llm-d-infra Installation

The llm-d-infra chart contains all the helm charts necessary to deploy llm-d-infra. To facilitate the installation of the helm charts, the llmd-infra-installer.sh script is provided. This script will populate the necessary manifests in the manifests directory.

inference-scheduling: llm-d-inference-scheduling
pd-disaggregation: llm-d-pd
precise-prefix-cache-aware: llm-d-wide-ep

Examples

Install llm-d on an Existing Kubernetes Cluster

export HF_TOKEN="your-token"
./llmd-infra-installer.sh

Install on OpenShift

Before running the installer, ensure you have logged into the cluster as a cluster administrator. For example:

oc login --token=sha256~yourtoken --server=https://api.yourcluster.com:6443

export HF_TOKEN="your-token"
./llmd-infra-installer.sh

Validation

After executing the install script, you will find that resources are created according to the installation options.

Installation with Istio

istio-system

kubectl get pods,svc -n istio-system

NAME                         READY   STATUS    RESTARTS   AGE
pod/istiod-774dfd9b6-wjlm2   1/1     Running   0          3m33s

NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                 AGE
service/istiod   ClusterIP   [Cluster IP]   <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP   3m33s

llm-d

The Namespace name might differ depending on the installation option.

kubectl get pods,gateway -n llm-d

NAME                                                      READY   STATUS    RESTARTS   AGE
pod/llm-d-infra-inference-gateway-istio-79b75bb5d-blwgs   1/1     Running   0          87s

NAME                                                              CLASS   ADDRESS                                                       PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/llm-d-infra-inference-gateway   istio   llm-d-infra-inference-gateway-istio.llm-d.svc.cluster.local   True         87s

llm-d-monitoring

kubectl get pods,gateway -n llm-d-monitoring

NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          2m51s
pod/prometheus-grafana-7fbfb5f947-h92zc                      3/3     Running   0          2m51s
pod/prometheus-kube-prometheus-operator-56c5c488db-clslv     1/1     Running   0          2m51s
pod/prometheus-kube-state-metrics-7f5f75c85d-twvj5           1/1     Running   0          2m51s
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          2m51s
pod/prometheus-prometheus-node-exporter-94jkw                1/1     Running   0          2m51s
pod/prometheus-prometheus-node-exporter-c8fzc                1/1     Running   0          2m51s
pod/prometheus-prometheus-node-exporter-tks77                1/1     Running   0          2m51s

Installation with kgateway

kgateway-system

kubectl get pods -n kgateway-system

NAME                       READY   STATUS    RESTARTS   AGE
kgateway-ddbb7668c-cc9df   1/1     Running   0          25m

llm-d

The Namespace name might differ depending on the installation option.

kubectl get pods,gateway -n llm-d

NAME                                                 READY   STATUS    RESTARTS   AGE
pod/llm-d-infra-inference-gateway-69fd4dcfb9-nzs29   1/1     Running   0          22m

NAME                                                              CLASS      ADDRESS        PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/llm-d-infra-inference-gateway   kgateway   [IP Address]   True         22m

llm-d-monitoring

kubectl get pods,gateway -n llm-d-monitoring

NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          24m
pod/prometheus-grafana-7fbfb5f947-jdb7l                      3/3     Running   0          24m
pod/prometheus-kube-prometheus-operator-56c5c488db-fr9vt     1/1     Running   0          24m
pod/prometheus-kube-state-metrics-7f5f75c85d-2nfwv           1/1     Running   0          24m
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          24m
pod/prometheus-prometheus-node-exporter-65cbt                1/1     Running   0          24m
pod/prometheus-prometheus-node-exporter-n9n6t                1/1     Running   0          24m
pod/prometheus-prometheus-node-exporter-szjwv                1/1     Running   0          24m

Metrics Collection

llm-d-infra includes support for metrics collection from vLLM pods. llm-d applies PodMonitors to trigger Prometheus scrape targets when enabled with llm-d-modelservice helm chart values. See MONITORING.md for details. In OpenShift, the built-in user workload monitoring Prometheus stack can be utilized to collect metrics. In Kubernetes, Prometheus and Grafana can be installed from the prometheus-community kube-prometheus-stack helm charts.

Uninstall

This will remove llm-d resources from the cluster. This is useful, especially for test/dev if you want to make a change, simply uninstall and then run the installer again with any changes you make.

./llmd-infra-installer.sh --uninstall

Content Source

This content is automatically synced from quickstart/README.md in the llm-d-incubation/llm-d-infra repository.

📝 To suggest changes, please edit the source file or create an issue.

Overview​

Client Configuration​

Target Platforms​

llm-d-infra Installation​

Examples​

Install llm-d on an Existing Kubernetes Cluster​

Install on OpenShift​

Validation​

Installation with Istio​

Installation with kgateway​

Metrics Collection​

Uninstall​