Skip to main content

EndpointPickerConfig (EPP Configuration)

EndpointPickerConfig defines the internal configuration for the Endpoint Picker (EPP). Unlike Kubernetes resources (like InferencePool), this is a configuration schema used to initialize the EPP binary, typically provided via a ConfigMap or a local file.

Version: config.apix.gateway-api-inference-extension.sigs.k8s.io/v1alpha1


EndpointPickerConfig

FieldDescription
featureGates[]string
A set of flags to enable experimental features (e.g., flowControl).
plugins[]PluginSpec
Required
List of plugins to be instantiated (e.g., scorers, adapters, reporters).
schedulingProfiles[]SchedulingProfile
Required
Named profiles that group plugins into routing slots.
saturationDetectorSaturationDetectorConfig
Configuration for the saturation detector plugin. Defaults to utilization-detector.
dataLayerDataLayerConfig
Configures the DataLayer for metadata extraction and processing.
flowControlFlowControlConfig
Configures global and per-priority admission control. Only respected if the flowControl feature gate is enabled.
parserParserConfig
Specifies the parsing logic for protocol messages (e.g., openai-parser).

PluginSpec

Defines a plugin instance and its parameters.

FieldDescription
namestring
Unique name for this plugin instance. If omitted, type is used.
typestring
Required
The plugin type to instantiate (e.g., least-request, openai-parser).
parametersjson.RawMessage
Arbitrary parameters passed to the plugin's factory function.

SchedulingProfile

Groups plugins to define specific routing behavior.

FieldDescription
namestring
Required
Name of the profile.
plugins[]SchedulingPlugin
Required
List of plugins associated with this profile.

SchedulingPlugin

FieldDescription
pluginRefstring
Required
Reference to a named plugin in the top-level plugins list.
weightfloat64
Weight used if the plugin is a Scorer.

FlowControlConfig

Configures admission control and queuing.

FieldDescription
maxBytesresource.Quantity
Global maximum aggregate byte size of all active requests.
maxRequestsresource.Quantity
Global maximum number of concurrent requests.
defaultRequestTTLduration
Fallback timeout for queued requests.
defaultPriorityBandPriorityBandConfig
Template for priority levels not explicitly configured.
priorityBands[]PriorityBandConfig
Explicit policies for specific priority levels.
usageLimitPolicyPluginRefstring
Reference to a UsageLimitPolicy plugin for adaptive capacity management.

PriorityBandConfig

FieldDescription
priorityint
Integer priority level. Higher is more critical.
maxBytesresource.Quantity
Max bytes allowed for this priority band.
maxRequestsresource.Quantity
Max concurrent requests allowed for this band.
fairnessPolicyRefstring
Policy governing flow selection (default: global-strict-fairness-policy).
orderingPolicyRefstring
Policy governing request selection within a flow (default: fcfs-ordering-policy).

DataLayerConfig

FieldDescription
sources[]DataLayerSource
Required
List of metadata sources.

DataLayerSource

FieldDescription
pluginRefstring
Required
Reference to a plugin providing the data source.
extractors[]DataLayerExtractor
Required
Plugins that extract specific attributes from the source.

SaturationDetectorConfig

FieldDescription
pluginRefstring
Reference to a plugin instance for saturation detection.

ParserConfig

FieldDescription
pluginRefstring
Required
Reference to a parser plugin (default: openai-parser).