Skip to main content

API Reference

Core Kubernetes APIs​

The following Kubernetes APIs are defined in the inference.networking.k8s.io (v1) and llm-d.ai (v1alpha2) groups.

ResourceAPI GroupVersionDescription
InferencePoolinference.networking.k8s.iov1Defines a pool of inference endpoints (model servers) to configure the Endpoint Picker (EPP) and Gateways for inference-optimized routing.
InferenceObjectivellm-d.aiv1alpha2Defines performance goals (priority, latency) for specific model workloads within a pool.
InferenceModelRewritellm-d.aiv1alpha2Specifies rules for rewriting model names in request bodies, enabling traffic splitting and canary rollouts.

Component Configuration​

These schemas define the internal configuration for project components and are typically provided via ConfigMaps or local files, rather than as standalone Kubernetes objects.

SchemaAPI GroupVersionDescription
EndpointPickerConfigllm-d.aiv1alpha1Defines the internal configuration for the Endpoint Picker (EPP), including plugins and request scheduling profiles.

Recognized HTTP Headers​

  • EPP HTTP Headers Reference: The EPP inspects specific HTTP headers to manage flow control and observability for inference requests.

Supported Request APIs​

See Also​

  • Glossary: Definitions of key terms and concepts used across this project.