Provisioning AWS EFS

Overview

This guide explains how to provision an AWS EFS-backed shared storage for llm-d using the EFS CSI driver.

Prerequisites

Ensure your cluster infrastructure is sufficient to deploy high scale inference.
An AWS account.
An EKS cluster.
An existing EFS filesystem.
Create a namespace for installation.

export NAMESPACE=llm-d-storage # or any other namespace (shorter names recommended)
kubectl create namespace ${NAMESPACE}

Cluster setup for provisioning AWS EFS (EKS cluster)

Install EFS CSI Driver

Follow the official AWS guide to install the EFS CSI driver:

https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

Ensure the driver is running in your cluster:

kubectl get pods -n kube-system | grep efs

Create an EFS File System

Create an EFS filesystem in the same VPC as your EKS cluster and note the fileSystemId.

You will need this ID when configuring the StorageClass.

Provisioning

EKS

1. Create a StorageClass for EFS:

cd guides/tiered-prefix-cache/storage/manifests/backends/aws
kubectl apply -f ./storage_class.yaml -n ${NAMESPACE}

Update the fileSystemId in storage_class.yaml before applying.

2. Create a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: llm-d-kv-cache-storage
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Ti
  storageClassName: efs-sc

Check the PVC

kubectl get pvc -n ${NAMESPACE}

Output should show the PVC as Bound:

NAME                        STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
llm-d-kv-cache-storage      Bound    ...      ...        RWX            efs-sc         1m

Performance Recommendations

For best performance with AWS EFS:

Use Max I/O performance mode for higher throughput
Prefer Provisioned Throughput for consistent performance under heavy workloads
Mount EFS in the same VPC and availability zone as your EKS nodes
Use multiple vLLM replicas to parallelize I/O
Monitor throughput and latency using AWS CloudWatch metrics

Note: EFS has higher latency than local SSD or Lustre, and is best suited for shared cache reuse rather than ultra-low latency workloads.

Cleanup

kubectl delete -f ./storage_class.yaml -n ${NAMESPACE}

Overview​

Prerequisites​

Cluster setup for provisioning AWS EFS (EKS cluster)​

Install EFS CSI Driver​

Create an EFS File System​

Provisioning​

EKS​

Check the PVC​

Performance Recommendations​

Cleanup​