vLLM DeepSeek๐Ÿ‹ Multinode Deployment on CoreWeave (KubeRay + Terraform)

Intro In the last CoreWeave post we ran vLLM on a single GPU box with mid-sized models (70-125B). DeepSeek-V3.2, on the other hand, doesn’t fit on one box (685B parameters, ~643GB even compressed). But how do you shard a whale this big across nodes and still serve it fast, at low latency, without it falling …

vLLM on EKS: Cut LLM Storage Costs by 95% with S3 Mountpoint

Intro When scaling AI models like DeepSeek or Qwen on Amazon EKS, engineering teams obsess over GPU utilization while quietly bleeding money on storage bloat. Because standard EBS volumes force a 1:1 replica-to-disk ratio, scaling a single 70GB model to 20 pods doesn’t cost 70GB, it forces you to provision 1.4 Terabytes of redundant EBS …

vLLM Production Stack on CoreWeave CKS with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro The vLLM Production Stack is designed to run on any Kubernetes-based infrastructure. After covering AWS , Azure, Google Cloud and Nebius MK8s implementations, today we’re deploying vLLM production-stack on CoreWeave Kubernetes (CKS) with the same Terraform framework. CoreWeave is one of the hottest NeoCould built on the idea that GenAI workloads donโ€™t need virtualization; they need direct access to …

Diffusion Models explained: From Noise to Pixels

Intro Today, most of us have used nano banana, Midjourney, Kling AI, Luma, or Sora to generate silly videos or catchy images on socials. But what do they share in common? They all rely on Diffusion Models as their core engine, even the brand new Seedance. While many of these are proprietary, the open-source world …