EndpointPickerConfig (EPP Configuration)
EndpointPickerConfig defines the internal configuration for the Endpoint Picker (EPP). Unlike Kubernetes resources (like InferencePool), this is a configuration schema used to initialize the EPP binary, typically provided via a ConfigMap or a local file.
Version: config.apix.gateway-api-inference-extension.sigs.k8s.io/v1alpha1
EndpointPickerConfig
| Field | Description |
|---|---|
featureGates | []string A set of flags to enable experimental features (e.g., flowControl). |
plugins | []PluginSpec Required List of plugins to be instantiated (e.g., scorers, adapters, reporters). |
schedulingProfiles | []SchedulingProfile Required Named profiles that group plugins into routing slots. |
saturationDetector | SaturationDetectorConfig Configuration for the saturation detector plugin. Defaults to utilization-detector. |
dataLayer | DataLayerConfig Configures the DataLayer for metadata extraction and processing. |
flowControl | FlowControlConfig Configures global and per-priority admission control. Only respected if the flowControl feature gate is enabled. |
parser | ParserConfig Specifies the parsing logic for protocol messages (e.g., openai-parser). |
PluginSpec
Defines a plugin instance and its parameters.
| Field | Description |
|---|---|
name | string Unique name for this plugin instance. If omitted, type is used. |
type | string Required The plugin type to instantiate (e.g., least-request, openai-parser). |
parameters | json.RawMessage Arbitrary parameters passed to the plugin's factory function. |
SchedulingProfile
Groups plugins to define specific routing behavior.
| Field | Description |
|---|---|
name | string Required Name of the profile. |
plugins | []SchedulingPlugin Required List of plugins associated with this profile. |
SchedulingPlugin
| Field | Description |
|---|---|
pluginRef | string Required Reference to a named plugin in the top-level plugins list. |
weight | float64 Weight used if the plugin is a Scorer. |
FlowControlConfig
Configures admission control and queuing.
| Field | Description |
|---|---|
maxBytes | resource.Quantity Global maximum aggregate byte size of all active requests. |
maxRequests | resource.Quantity Global maximum number of concurrent requests. |
defaultRequestTTL | duration Fallback timeout for queued requests. |
defaultPriorityBand | PriorityBandConfig Template for priority levels not explicitly configured. |
priorityBands | []PriorityBandConfig Explicit policies for specific priority levels. |
usageLimitPolicyPluginRef | string Reference to a UsageLimitPolicy plugin for adaptive capacity management. |
PriorityBandConfig
| Field | Description |
|---|---|
priority | int Integer priority level. Higher is more critical. |
maxBytes | resource.Quantity Max bytes allowed for this priority band. |
maxRequests | resource.Quantity Max concurrent requests allowed for this band. |
fairnessPolicyRef | string Policy governing flow selection (default: global-strict-fairness-policy). |
orderingPolicyRef | string Policy governing request selection within a flow (default: fcfs-ordering-policy). |
DataLayerConfig
| Field | Description |
|---|---|
sources | []DataLayerSource Required List of metadata sources. |
DataLayerSource
| Field | Description |
|---|---|
pluginRef | string Required Reference to a plugin providing the data source. |
extractors | []DataLayerExtractor Required Plugins that extract specific attributes from the source. |
SaturationDetectorConfig
| Field | Description |
|---|---|
pluginRef | string Reference to a plugin instance for saturation detection. |
ParserConfig
| Field | Description |
|---|---|
pluginRef | string Required Reference to a parser plugin (default: openai-parser). |