llm-d on minikube
Prerequisites
Platform Setup
This can be run on a minimum ec2 node type g6e.12xlarge (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
Verify you have properly installed the container toolkit with the runtime of your choice.
# Podman
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
# Docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
llm-d-infra Installation
TBD