Skip to main content

InferenceModelRewrite

InferenceModelRewrite defines rules for rewriting inference requests, such as traffic splitting, A/B tests, or canary rollouts across weighted model targets.

Group: inference.networking.x-k8s.io
Version: v1alpha2


InferenceModelRewrite​

FieldDescription
apiVersioninference.networking.x-k8s.io/v1alpha2
kindInferenceModelRewrite
metadatametav1.ObjectMeta
specInferenceModelRewriteSpec
Spec defines the desired state of the rewrite rules.
statusInferenceModelRewriteStatus
Status defines the observed state of the resource.

InferenceModelRewriteSpec​

FieldDescription
poolRefPoolObjectReference
Required
Reference to the target InferencePool.
rules[]InferenceModelRewriteRule
Required
Ordered set of rules. The first rule to match a request is used.

InferenceModelRewriteRule​

InferenceModelRewriteRule defines the match criteria and corresponding actions (targets).

FieldDescription
matches[]Match
Optional
Criteria for matching a request. Logical OR if multiple criteria are specified. If empty, matches all requests.
targets[]TargetModel
Optional
How to distribute traffic across weighted model targets. Min items: 1.

Match​

Match defines the criteria for matching LLM requests.

FieldDescription
modelModelMatch
Required
Criteria for matching the model field in the JSON request body.

ModelMatch​

FieldDescription
typestring
Kind of string matching to use. Supported value: Exact. Defaults to Exact.
valuestring
Required
The model name string to match against.

TargetModel​

TargetModel defines a weighted model destination.

FieldDescription
weightint32
Optional
Proportion of requests forwarded to the model. Computed as weight/(sum of all weights). Min: 1, Max: 1000000. If set for one, must be set for all.
modelRewritestring
Required
The static model name to rewrite the request to.

InferenceModelRewriteStatus​

FieldDescription
conditions[]metav1.Condition
Conditions track the state. Known type: Accepted.

PoolObjectReference​

PoolObjectReference identifies an API object within the same namespace.

FieldDescription
groupstring
Group of the referent. Defaults to inference.networking.k8s.io.
kindstring
Kind of the referent. Defaults to InferencePool.
namestring
Required
Name of the referent.

Precedence and Conflict Resolution​

  1. Model Match Precision: Rules with an Exact model match take precedence over generic matches (empty matches).
  2. Resource Age: If multiple resources target the same pool with identical matches, the oldest resource (by creation timestamp) takes precedence.
  3. Rule Order: Within a single resource, the FIRST matching rule (in list order) is used.

Condition Types and Reasons​

Accepted​

Indicates if the rewrite is valid, non-conflicting, and applied to the pool.

  • True Reasons:
    • Accepted: Rewrite is valid and successfully applied.
  • Unknown Reasons:
    • Pending: Initial state, controller has not yet reconciled the resource.