Version: v0.8 (latest)

API Reference

Core Kubernetes APIs

The following Kubernetes APIs are defined in the inference.networking.k8s.io (v1) and llm-d.ai (v1alpha2) groups.

Resource	API Group	Version	Description
InferencePool	`inference.networking.k8s.io`	`v1`	Defines a pool of inference endpoints (model servers) to configure the Endpoint Picker (EPP) and Gateways for inference-optimized routing.
InferenceObjective	`llm-d.ai`	`v1alpha2`	Defines performance goals (priority, latency) for specific model workloads within a pool.
InferenceModelRewrite	`llm-d.ai`	`v1alpha2`	Specifies rules for rewriting model names in request bodies, enabling traffic splitting and canary rollouts.

Component Configuration

These schemas define the internal configuration for project components and are typically provided via ConfigMaps or local files, rather than as standalone Kubernetes objects.

Schema	API Group	Version	Description
EndpointPickerConfig	`llm-d.ai`	`v1alpha1`	Defines the internal configuration for the Endpoint Picker (EPP), including plugins and request scheduling profiles.

Recognized HTTP Headers

EPP HTTP Headers Reference: The EPP inspects specific HTTP headers to manage flow control and observability for inference requests.

Supported Request APIs

EPP HTTP APIs Reference: HTTP APIs such as OpenAI's Chat, Anthropic's Message and vLLM's Generate APIs.
EPP gRPC APIs Reference: gRPC APIs such as vLLM's gRPC Generate API.

Core Kubernetes APIs​

Component Configuration​

Recognized HTTP Headers​

Supported Request APIs​

See Also​

Core Kubernetes APIs

Component Configuration

Recognized HTTP Headers

Supported Request APIs

See Also