Distributed Tracing
This guide shows how to enable OpenTelemetry distributed tracing across llm-d components.
This guide assumes a running llm-d deployment with an InferencePool and model servers. For metrics and dashboards, see Metrics.
Commands in this guide use ${NAMESPACE} for the namespace where your llm-d workload runs:
export NAMESPACE=<your-llm-d-namespace>
What Gets Traced
| Component | Config Method | Traced Operations |
|---|---|---|
| vLLM (prefill + decode) | Kustomize: container args + env vars | Inference engine spans |
| Routing proxy (P/D sidecar) | Kustomize: container env vars | KV transfer coordination |
| EPP | Helm: GAIE inferenceExtension.tracing: | Request routing, endpoint scoring, KV-cache indexing |
All components export traces via OTLP gRPC to an OpenTelemetry Collector, which filters noise (e.g., /metrics scraping spans), batches traces, and forwards them to a backend like Jaeger.
Step 1: Deploy OTel Collector and Jaeger
Deploy the OTel Collector and Jaeger into the same namespace as your llm-d workload:
./guides/recipes/observability/install-otel-collector-jaeger.sh -n ${NAMESPACE}
If the OpenTelemetry Operator is installed, the script uses an OpenTelemetryCollector CR. Otherwise it deploys a standalone collector Deployment.
Verify the components are running:
kubectl get pods -n ${NAMESPACE} -l app=otel-collector
kubectl get pods -n ${NAMESPACE} -l app=jaeger
Expected output:
NAME READY STATUS RESTARTS AGE
otel-collector-xxxxxxxxx-xxxxx 1/1 Running 0 30s
NAME READY STATUS RESTARTS AGE
jaeger-xxxxxxxxx-xxxxx 1/1 Running 0 30s
Manual Deployment
If you prefer to apply manifests directly:
# Standalone collector (no operator)
kubectl apply -n ${NAMESPACE} -f guides/recipes/observability/tracing/jaeger-all-in-one.yaml \
-f guides/recipes/observability/tracing/otel-collector.yaml
# Or with the OTel Operator installed
kubectl apply -n ${NAMESPACE} -f guides/recipes/observability/tracing/jaeger-all-in-one.yaml \
-f guides/recipes/observability/tracing/otel-collector-operator.yaml
Verify with the same kubectl get pods commands above.
Step 2: Enable Tracing on the Model Server and Routing Proxy
Model servers are deployed with kustomize. The example below uses vLLM; SGLang and other OpenTelemetry-capable model servers use the same OTEL_* environment variables — set OTEL_SERVICE_NAME to match your engine and role. Add the engine's tracing flags to its serve command and the OTEL env vars to the container:
# Add to the model server's serve command (vLLM shown):
# --otlp-traces-endpoint http://otel-collector:4317
# --collect-detailed-traces all
# Add to the container env (applies to any OpenTelemetry-capable engine):
env:
- name: OTEL_SERVICE_NAME
value: "vllm-decode" # name per engine/role, e.g. vllm-decode, vllm-prefill, sglang-decode
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "1.0"
Step 3: Enable Tracing on EPP
Add the tracing configuration to your GAIE values:
# In your gaie-*/values.yaml
inferenceExtension:
tracing:
enabled: true
otelExporterEndpoint: "http://otel-collector:4317"
sampling:
sampler: "parentbased_traceidratio"
samplerArg: "1.0"
Step 4: View Traces
Access the Jaeger UI:
kubectl port-forward -n ${NAMESPACE} svc/jaeger-collector 16686:16686
# Open http://localhost:16686
Verify traces are flowing:
- Send an inference request through llm-d
- Open the Jaeger UI
- Select a service (e.g.,
vllm-decode,llm-d-router/epp) - Click Find Traces
You should see traces with multiple spans covering the request lifecycle. You can also verify via the Jaeger API:
curl -s http://localhost:16686/api/services | jq '.data'
Expected output:
[
"vllm-decode",
"llm-d-router/epp"
]
If you only see generic GET spans, check that:
- The vLLM container args include
--collect-detailed-traces all - The EPP image includes tracing instrumentation (
llm-d-router-endpoint-picker-dev, not upstreamepp)
Production Recommendations
- Sampling: Set
samplerArgto"0.1"(10%) or lower to reduce overhead - Collector: Use a collector to batch, filter, and route traces to a persistent backend
- Backend: Use Jaeger with Elasticsearch/Cassandra storage, or Grafana Tempo for long-term retention
- Service names: Set
OTEL_SERVICE_NAMEper container (e.g.,vllm-decode-prod,epp-us-east) to distinguish clusters and environments
Environment Variable Reference
When tracing is enabled, these environment variables are set on vLLM and routing-proxy containers:
| Variable | Description |
|---|---|
OTEL_SERVICE_NAME | Service identifier (e.g., vllm-decode, routing-proxy) |
OTEL_EXPORTER_OTLP_ENDPOINT | Collector endpoint (http://otel-collector:4317) |
OTEL_TRACES_SAMPLER | Sampler type (e.g., parentbased_traceidratio) |
OTEL_TRACES_SAMPLER_ARG | Sampling ratio (1.0 = 100%, 0.1 = 10%) |
Cleanup
./guides/recipes/observability/install-otel-collector-jaeger.sh -u -n ${NAMESPACE}