vLLM DeepSeek๐Ÿ‹ Multinode Deployment on CoreWeave (KubeRay + Terraform)

Intro In the last CoreWeave post we ran vLLM on a single GPU box with mid-sized models (70-125B). DeepSeek-V3.2, on the other hand, doesn’t fit on one box (685B parameters, ~643GB even compressed). But how do you shard a whale this big across nodes and still serve it fast, at low latency, without it falling …

vLLM Production Stack on CoreWeave CKS with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro The vLLM Production Stack is designed to run on any Kubernetes-based infrastructure. After covering AWS , Azure, Google Cloud and Nebius MK8s implementations, today we’re deploying vLLM production-stack on CoreWeave Kubernetes (CKS) with the same Terraform framework. CoreWeave is one of the hottest NeoCould built on the idea that GenAI workloads donโ€™t need virtualization; they need direct access to …

Inside CoreWeave Cloud: CLI & Platform Primer

Intro No invite? No quota? No problem. If youโ€™ve tried to create an account on CoreWeave, you already know the drill: thereโ€™s No open self-registration, No free tier, and No โ€œSign up with GitHubโ€โ€”without an invite. That’s why I decided to write my first CoreWeave blog post. This post shows how to get started with …