Skip to main content

InferenceObjective

InferenceObjective represents the desired state of a specific model use case. It allows "Inference Workload Owners" to define performance and latency goals for a model within an InferencePool.

Group: inference.networking.x-k8s.io
Version: v1alpha2


InferenceObjective

FieldDescription
apiVersioninference.networking.x-k8s.io/v1alpha2
kindInferenceObjective
metadatametav1.ObjectMeta
specInferenceObjectiveSpec
Spec represents the desired state of the model use case.
statusInferenceObjectiveStatus
Status defines the observed state of the InferenceObjective.

InferenceObjectiveSpec

InferenceObjectiveSpec defines the priority and the pool reference for the model workload.

FieldDescription
priorityint
Optional
Defines how important it is to serve the request compared to others in the same pool. Higher values have higher priority. Unset value is treated as 0. Requests of higher priority are served first when resources are scarce.
poolRefPoolObjectReference
Required
Reference to the inference pool. The pool must exist in the same namespace.

InferenceObjectiveStatus

InferenceObjectiveStatus defines the observed state of InferenceObjective.

FieldDescription
conditions[]metav1.Condition
Conditions track the state of the InferenceObjective. Known type: Accepted.

PoolObjectReference

PoolObjectReference identifies an API object within the same namespace.

FieldDescription
groupstring
Group of the referent. Defaults to inference.networking.k8s.io.
kindstring
Kind of the referent. Defaults to InferencePool.
namestring
Required
Name of the referent.

Condition Types and Reasons

Accepted

Indicates if the objective configuration is accepted.

  • True Reasons:
    • Accepted: Model conforms to the state of the pool.
  • Unknown Reasons:
    • Pending: Initial state, controller has not yet reconciled the resource.