Vllm Archives - Cloudthrill

vLLM Production Stack on GCP GKE with Terraform🧑🏼‍🚀

by CloudDudeAI, Devops, GCP, kubernetes, LLM, terraform, Vllm December 16, 2025December 16, 2025Comments are Disabled

Intro Welcome back to the terraform vLLM Production Stack series! After covering AWS EKS and Azure AKS, today we’re deploying vLLM production-stack on Google Cloud GKE with the same Terraform approach. This guide shows you how to deploy a production-ready LLM serving environment on Google Cloud, with GCP-specific optimizations including Dataplane V2 (Cilium eBPF), VPC-native …

The link That Never Was: Intel Optane PMem + LLM KV Cache Offload

by CloudDudeAI, LLM, Vllm December 9, 2025December 9, 2025Comments are Disabled

Intro The world of LLMs is dominated by one expensive bottleneck: GPU memory. This directly impacts how many models can fit and how fast can they generate text, especially in multi-turn conversations or for processing long contexts. The solution is KV Cache Offloading (i.e with LMCache). One technology was perfectly suited to supercharge it, Intel …

vLLM Production Stack on Azure AKS with Terraform🧑🏼‍🚀

by CloudDudeAI, Azure, Devops, kubernetes, LLM, MultiCloud, terraform, Vllm December 2, 2025December 2, 2025Comments are Disabled

Intro The vLLM Production Stack is designed to work across any cloud provider with Kubernetes. After covering AWS EKS, today we’re deploying vLLM production-stack on Azure AKS with the same Terraform approach. This guide shows you how to deploy the same production-ready LLM serving environment on Azure, with azure-specific optimizations. We’ll cover network architecture, certificate …

vLLM Production Stack on Amazon EKS with Terraform🧑🏼‍🚀

by CloudDudeAI, AWS, Devops, kubernetes, LLM, MultiCloud, terraform, Vllm November 18, 2025December 2, 2025Comments are Disabled

Intro Deploying vLLM manually is fine for a lab, but running it in production means dealing with Kubernetes, autoscaling, GPU orchestration, and observability. That’s where the vLLM Production Stack comes in – a Terraform-based blueprint that delivers production-ready LLM serving with enterprise-grade foundations. In this post, we’ll deploy it on Amazon EKS, covering everything from …

LLM Embeddings Explained Like I’m 5

by CloudDudeAI, LLM, Vllm October 21, 2025October 23, 2025Comments are Disabled

Intro We often hear about RAG (Retrieval-Augmented Generation) and vector databases that store embeddings, but we fail to remember what exactly are embeddings used for and how they work. In this post, we’ll break down how embeddings work – in the simplest way possible (yes, like you’re 5 🧠📎). I. What is an Embedding? Embeddings …

Latest Podcasts

Category: Vllm

vLLM Production Stack on GCP GKE with Terraform🧑🏼‍🚀

The link That Never Was: Intel Optane PMem + LLM KV Cache Offload

vLLM Production Stack on Azure AKS with Terraform🧑🏼‍🚀

vLLM Production Stack on Amazon EKS with Terraform🧑🏼‍🚀

LLM Embeddings Explained Like I’m 5