GCP Pub/Sub Implementation
This implementation uses GCP Pub/Sub as the backend for the request and result queues. It's ideal for cloud-native deployments on Google Cloud.
Prerequisites
- GCP Project: Ensure you have a GCP project with the Pub/Sub API enabled.
- Workload Identity: Your Kubernetes service account must have permissions to publish to and subscribe from Pub/Sub topics.
Topic setup, Configuration and Deployment:
Topic Setup
We recommend setting-up a topic per model+priority, i.e., per inference objective.
For a simple one model & one usecase create a single topic.
export REQUEST_TOPIC_NAME=async-proc-requests # choose topic name for requests
gcloud pubsub topics create $REQUEST_TOPIC_NAME
For each request topic create a subscription with the following configurations:
- Exactly-once delivery.
- Retries with exponential backoff.
- Dead Letter Queue (DLQ).
Note: If DLQ is NOT configured for the request topic. Retried messages will be counted multiple times in the number_of_requests metric.
Example:
export SUBSCRIPTION_NAME=async-proc-requests-sub # choose subscription name for each request topic
export DLQ_NAME=async-proc-requests-dlq # choose DLQ name
export RESULT_TOPIC_NAME=async-proc-results # choose topic name for results
gcloud pubsub topics create $DLQ_NAME
gcloud pubsub topics create $RESULT_TOPIC_NAME
# create subscription for DLQ topic so messages will not get lost
gcloud pubsub subscriptions create sub-$DLQ_NAME \
--topic=$DLQ_NAME
# create subscription for request topic
gcloud pubsub subscriptions create $SUBSCRIPTION_NAME \
--topic=$REQUEST_TOPIC_NAME \
--dead-letter-topic=$DLQ_NAME \
--max-delivery-attempts=35 \
--enable-exactly-once-delivery
Configuration and Deployment
We provide a values.yaml for this implementation in guides/asynchronous-processing/gcp-pubsub/values.yaml.
Edit the values.yaml file with your specific GCP project and resources:
ap:
gcpPubSub:
requestSubscriberId: "projects/<your-project>/subscriptions/async-proc-requests-sub"
resultTopicId: "projects/<your-project>/topics/async-proc-results"
For deployment instructions, please refer to the main README.
Testing
-
Publish a message:
gcloud pubsub topics publish $REQUEST_TOPIC_NAME --message='{"id" : "testmsg", "payload":{ "model":"your-model", "prompt":"Hi, good morning "}, "deadline" :"1999999999" }' -
Pull from results subscription: First, create a subscription for the results topic if you haven't already:
gcloud pubsub subscriptions create async-proc-results-sub --topic=$RESULT_TOPIC_NAMEThen pull the result:
gcloud pubsub subscriptions pull async-proc-results-sub --auto-ack --limit=1