Customizing your Gateway
This document is meant to walk through some choices when setting up your gateway.
Using an Ingress
If using a gateway service of type ClusterIP you have the option to create an ingress to expose your gateway
gateway:
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: traefik
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
Gateway Considerations for Benchmarking
In our experience there are a few gotchas we encountered when attempting to benchmark for competitive numbers, this document is meant to capture those gateway related challenges we encountered and workarounds we derived.
Increasing Envoy Pod Resources
To remove the chance that the envoy pod can be a bottle neck, we thought it might be a good idea to increase its resource. You can see how to set this from our benchmarking values file.
This option gets exposed from the llm-d-infra chart by either a gatewayParameters manifest if you are using Kgateway, or a configmap if you are using Istio.
Increasing Max Connections and Timeout (Istio only)
Currently we only have a workaround for increasing max connections and timeout for Istio, we would like to expand this to other providers in the future. This is provided via the Istio destination rule, you can see a values configuration for this in our benchmarking values file.
This gets exposed by the DestinationRule template in the llm-d-infra charts, but we hope this manifest will make its way upstream to the GAIE charts.
Changing log levels
In an effort to reduce the workload of the envoy gateway container, we have dropped the logs to level error, only showing us critical issues. Again this is visible in the benchmarking values file.
This documentation corresponds to llm-d v0.3.1, the latest public release. For the most current development changes, see this file on main.
📝 To suggest changes or report issues, please create an issue.
Source: docs/customizing-your-gateway.md