vLLM DeepSeek๐ Multinode Deployment on CoreWeave (KubeRay + Terraform)
Intro In the last CoreWeave post we ran vLLM on a single GPU box with mid-sized models (70-125B). DeepSeek-V3.2, on the other hand, doesn’t fit on one box (685B parameters, ~643GB even compressed). But how do you shard a whale this big across nodes and still serve it fast, at low latency, without it falling …
Read more “vLLM DeepSeek๐ Multinode Deployment on CoreWeave (KubeRay + Terraform)”