Skip to main content

InferenceModelRewrite

InferenceModelRewrite defines rules for rewriting inference requests, such as traffic splitting, A/B tests, or canary rollouts across weighted model targets.

Group: inference.networking.x-k8s.io
Version: v1alpha2


InferenceModelRewrite

FieldDescription
apiVersioninference.networking.x-k8s.io/v1alpha2
kindInferenceModelRewrite
metadatametav1.ObjectMeta
specInferenceModelRewriteSpec
Spec defines the desired state of the rewrite rules.
statusInferenceModelRewriteStatus
Status defines the observed state of the resource.

InferenceModelRewriteSpec

FieldDescription
poolRefPoolObjectReference
Required
Reference to the target InferencePool.
rules[]InferenceModelRewriteRule
Required
Ordered set of rules. The first rule to match a request is used.

InferenceModelRewriteRule

InferenceModelRewriteRule defines the match criteria and corresponding actions (targets).

FieldDescription
matches[]Match
Optional
Criteria for matching a request. Logical OR if multiple criteria are specified. If empty, matches all requests.
targets[]TargetModel
Optional
How to distribute traffic across weighted model targets. Min items: 1.

Match

Match defines the criteria for matching LLM requests.

FieldDescription
modelModelMatch
Required
Criteria for matching the model field in the JSON request body.

ModelMatch

FieldDescription
typestring
Kind of string matching to use. Supported value: Exact. Defaults to Exact.
valuestring
Required
The model name string to match against.

TargetModel

TargetModel defines a weighted model destination.

FieldDescription
weightint32
Optional
Proportion of requests forwarded to the model. Computed as weight/(sum of all weights). Min: 1, Max: 1000000. If set for one, must be set for all.
modelRewritestring
Required
The static model name to rewrite the request to.

InferenceModelRewriteStatus

FieldDescription
conditions[]metav1.Condition
Conditions track the state. Known type: Accepted.

PoolObjectReference

PoolObjectReference identifies an API object within the same namespace.

FieldDescription
groupstring
Group of the referent. Defaults to inference.networking.k8s.io.
kindstring
Kind of the referent. Defaults to InferencePool.
namestring
Required
Name of the referent.

Precedence and Conflict Resolution

  1. Model Match Precision: Rules with an Exact model match take precedence over generic matches (empty matches).
  2. Resource Age: If multiple resources target the same pool with identical matches, the oldest resource (by creation timestamp) takes precedence.
  3. Rule Order: Within a single resource, the FIRST matching rule (in list order) is used.

Condition Types and Reasons

Accepted

Indicates if the rewrite is valid, non-conflicting, and applied to the pool.

  • True Reasons:
    • Accepted: Rewrite is valid and successfully applied.
  • Unknown Reasons:
    • Pending: Initial state, controller has not yet reconciled the resource.