llm-d Quick Start
Overview
This quick start will walk you through the steps to install and deploy llm-d on a Kubernetes cluster and explain some of the key choices at each step as well as how to validate and remove your deployment.
Prerequisites
Run with sufficient permissions to deploy
Before running any deployment, ensure you have have sufficient permissions to deploy new custom resource definitions (CRDs), alter roles. Our guides are written for cluster administrators, especially for the prerequisites. Once prerequisites are configured, deploying model servers and new InferencePools typically requires only namespace editor permissions.
Tool Dependencies
You will need to install some dependencies (like helm, yq, git, etc.) and have a HuggingFace token for most examples. We have documented these requirements and instructions in the prereq/client-setup directory. To install the dependencies, use the provided install-deps.sh script.
HuggingFace Token
A HuggingFace token is required to download models from the HuggingFace Hub. You must create a Kubernetes secret containing your HuggingFace token in the target namespace before deployment, see instructions.
Gateway provider
Additionally, it is assumed you have configured and deployed your Kubernetes Gateway control plane and its prerequisite CRDs. For information see the gateway-provider prereq.
Target Platforms
llm-d can be deployed on a variety of Kubernetes platforms. Specific requirements, workarounds, and any other documentation relevant to these platforms will live in the infra-providers directory.
Deployment
Select an appropriate guide from the list in the README.md. We recommend starting with the inference scheduling well-lit path if you are looking to deploy vLLM in a recommended production serving configuration.
Navigate to the desired guide directory and follow its README instructions. For example:
cd quickstarts/guides/inference-scheduling # Navigate to your desired example directory
# Follow the README.md instructions in the example directory
When you complete the deployment successfully, return here.
Validation
You should be able to list all Helm releases to view the charts installed installed by the guide:
helm list -n ${NAMESPACE}
You can view all resources in your namespace with:
kubectl get all -n ${NAMESPACE}
Note: This assumes no other guide deployments in your given ${NAMESPACE}
.
Making inference requests to your deployments
For instructions on getting started with making inference requests, see getting-started-inferencing.md.
Metrics collection
llm-d charts include support for metrics collection from vLLM pods. llm-d applies PodMonitors to trigger Prometheus scrape targets when enabled with the appropriate Helm chart values. See MONITORING.md for details.
In Kubernetes, Prometheus and Grafana can be installed from the prometheus-community kube-prometheus-stack helm charts. In OpenShift, the built-in user workload monitoring Prometheus stack can be utilized to collect metrics.
Uninstall
To remove llm-d resources from the cluster, refer to the uninstallation instructions in the specific guide README that you installed.
This content is automatically synced from guides/QUICKSTART.md in the llm-d/llm-d repository.
📝 To suggest changes, please edit the source file or create an issue.