vLLM Production Stack on GCP GKE with Terraform🧑🏼‍🚀

Intro Welcome back to the terraform vLLM Production Stack series! After covering AWS EKS and Azure AKS, today we’re deploying vLLM production-stack on Google Cloud GKE with the same Terraform approach. This guide shows you how to deploy a production-ready LLM serving environment on Google Cloud, with GCP-specific optimizations including Dataplane V2 (Cilium eBPF), VPC-native …

The link That Never Was: Intel Optane PMem + LLM KV Cache Offload

Intro The world of LLMs is dominated by one expensive bottleneck: GPU memory. This directly impacts how many models can fit and how fast can they generate text, especially in multi-turn conversations or for processing long contexts. The solution is KV Cache Offloading (i.e with LMCache). One technology was perfectly suited to supercharge it, Intel …

vLLM Production Stack on Azure AKS with Terraform🧑🏼‍🚀

Intro The vLLM Production Stack is designed to work across any cloud provider with Kubernetes. After covering AWS EKS, today we’re deploying vLLM production-stack on Azure AKS with the same Terraform approach. This guide shows you how to deploy the same production-ready LLM serving environment on Azure, with azure-specific optimizations. We’ll cover network architecture, certificate …

Meet Nebius: The Cloud Built for the AI Era

Intro Every once in a while, a new cloud platform shows up that doesn’t just offer “more compute” . It rethinks what the cloud should look like in an AI-first world. That’s what caught my attention with Nebius, a European-born cloud designed from the ground up for high-performance, AI-centric workloads. One that just closed a …

vLLM Production Stack on Amazon EKS with Terraform🧑🏼‍🚀

Intro Deploying vLLM manually is fine for a lab, but running it in production means dealing with Kubernetes, autoscaling, GPU orchestration, and observability. That’s where the vLLM Production Stack comes in – a Terraform-based blueprint that delivers production-ready LLM serving with enterprise-grade foundations. In this post, we’ll deploy it on Amazon EKS, covering everything from …