vLLM Production Stack on Nebius K8s with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro The vLLM Production Stack is designed to work across any cloud provider with Kubernetes. After covering AWS EKS, Azure AKS, and Google Cloud GKE implementations, today we’re deploying vLLM production-stack on Nebius Managed Kubernetes (MK8s) with the same Terraform approach. Nebius AI Cloud is purpose-built for AI/ML workloads, offering cutting-edge GPU options from NVIDIA …

Turn Your Localhost into a FREE Public URL with Ngrok & Zrok -part 2

Intro In Part 1, we explored what zrok is, its key features, and how it compares conceptually to ngrok as a self-hostable alternative. Now, it’s time to put the spotlight on Ngrok . In this post, weโ€™ll walk through Ngrok installation, setup, and real-world usageโ€”starting by a head-to-head feature comparison to see how these two …

vLLM Production Stack on GCP GKE with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro Welcome back to the terraform vLLM Production Stack series! After covering AWS EKS and Azure AKS, today we’re deploying vLLM production-stack on Google Cloud GKE with the same Terraform approach. This guide shows you how to deploy a production-ready LLM serving environment on Google Cloud, with GCP-specific optimizations including Dataplane V2 (Cilium eBPF), VPC-native …

The link That Never Was: Intel Optane PMem + LLM KV Cache Offload

Intro The world of LLMs is dominated by one expensive bottleneck: GPU memory. This directly impacts how many models can fit and how fast can they generate text, especially in multi-turn conversations or for processing long contexts. The solution is KV Cache Offloading (i.e with LMCache). One technology was perfectly suited to supercharge it, Intel …

vLLM Production Stack on Azure AKS with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro The vLLM Production Stack is designed to work across any cloud provider with Kubernetes. After covering AWS EKS, today we’re deploying vLLM production-stack on Azure AKS with the same Terraform approach. This guide shows you how to deploy the same production-ready LLM serving environment on Azure, with azure-specific optimizations. We’ll cover network architecture, certificate …