Skip to main content

InferencePool

InferencePool is the Schema for the InferencePools API. It defines a pool of inference endpoints that can be used by a Gateway.

Group: inference.networking.k8s.io
Version: v1


InferencePool

FieldDescription
apiVersioninference.networking.k8s.io/v1
kindInferencePool
metadatametav1.ObjectMeta
specInferencePoolSpec
Required
Spec defines the desired state of the InferencePool.
statusInferencePoolStatus
Status defines the observed state of the InferencePool.

InferencePoolSpec

InferencePoolSpec defines the desired state of the InferencePool.

FieldDescription
selectorLabelSelector
Required
Selector determines which Pods are members of this inference pool. It matches Pods by their labels only within the same namespace; cross-namespace selection is not supported.
The structure is intentionally simple to be compatible with Kubernetes Service selectors.
targetPorts[]Port
Required
TargetPorts defines a list of ports that are exposed by this InferencePool. Every port will be treated as a distinctive endpoint by EPP, addressable as a podIP:portNumber combination.
Max items: 8. Port numbers must be unique.
appProtocolAppProtocol
AppProtocol describes the application protocol for all the target ports. If unspecified, the protocol defaults to http (HTTP/1.1).
endpointPickerRefEndpointPickerRef
Required
EndpointPickerRef is a reference to the Endpoint Picker extension and its associated configuration.

InferencePoolStatus

InferencePoolStatus defines the observed state of the InferencePool.

FieldDescription
parents[]ParentStatus
Parents is a list of parent resources, typically Gateways, that are associated with the InferencePool, and the status of the InferencePool with respect to each parent.
Max items: 32.

Port

Port defines the network port that will be exposed by this InferencePool.

FieldDescription
numberint32
Required
Number defines the port number to access the selected model server Pods. Must be in range 1 to 65535.

AppProtocol

AppProtocol describes the application protocol for a port.

Supported values:

  • http: HTTP/1.1. This is the default.
  • kubernetes.io/h2c: HTTP/2 over cleartext. Typically used for gRPC workloads where TLS is terminated at the Gateway.

EndpointPickerRef

EndpointPickerRef specifies a reference to an Endpoint Picker extension and its associated configuration.

FieldDescription
groupstring
Group of the referent API object. Defaults to "" (Core API group).
kindstring
Kind of the referent. Defaults to Service. Implementations MUST NOT support ExternalName Services.
namestring
Required
Name of the referent API object.
portPort
Port of the Endpoint Picker extension service. Required when kind is Service.
failureModestring
Configures how the parent handles cases when the Endpoint Picker extension is non-responsive.
Defaults to FailClose.
Supported values: FailOpen, FailClose.

ParentStatus

ParentStatus defines the observed state of InferencePool from a Parent, i.e. Gateway.

FieldDescription
conditions[]metav1.Condition
Conditions provide information about the observed state. Supported types: Accepted, ResolvedRefs.
parentRefParentReference
Required
Identifies the parent resource this status is associated with.
controllerNamestring
Name of the controller that wrote this status (e.g., example.net/gateway-controller).

ParentReference

ParentReference identifies an API object, such as a Gateway.

FieldDescription
groupstring
Group of the referent. Defaults to gateway.networking.k8s.io.
kindstring
Kind of the referent. Defaults to Gateway.
namestring
Required
Name of the referent.
namespacestring
Namespace of the referenced object. Defaults to the local namespace.

LabelSelector

LabelSelector defines a query for resources based on their labels.

FieldDescription
matchLabelsmap[string]string
Required
A set of (key,value) pairs. An object must match every label in this map (AND operation).
Max properties: 64.

Condition Types and Reasons

Accepted

Indicates whether the InferencePool has been accepted or rejected by a Parent.

  • True Reasons:
    • Accepted: Supported by parent.
  • False Reasons:
    • NotSupportedByParent: Parent does not support InferencePool as a backend.
    • HTTPRouteNotAccepted: Referenced by an HTTPRoute that has been rejected.
  • Unknown Reasons:
    • Pending

ResolvedRefs

Indicates whether the controller was able to resolve all object references.

  • True Reasons:
    • ResolvedRefs
  • False Reasons:
    • InvalidExtensionRef: Extension is invalid (unsupported kind/group or not found).

Exported

Indicates whether the controller was able to export the InferencePool to specified clusters.

  • True Reasons:
    • Exported
  • False Reasons:
    • NotRequested: No export was requested.
    • NotSupported: Export requested but not supported by implementation.