VLLM Archives - Cloudthrill

vLLM Production Stack on Azure AKS with Terraform🧑🏼‍🚀

by CloudDudeAI, Azure, Devops, kubernetes, LLM, MultiCloud, terraform, Vllm December 2, 2025December 2, 2025Comments are Disabled

Intro The vLLM Production Stack is designed to work across any cloud provider with Kubernetes. After covering AWS EKS, today we’re deploying vLLM production-stack on Azure AKS with the same Terraform approach. This guide shows you how to deploy the same production-ready LLM serving environment on Azure, with azure-specific optimizations. We’ll cover network architecture, certificate …

Meet Nebius: The Cloud Built for the AI Era

by CloudDudeAI, LLM, MultiCloud, NeoClouds November 25, 2025November 25, 2025Comments are Disabled

Intro Every once in a while, a new cloud platform shows up that doesn’t just offer “more compute” . It rethinks what the cloud should look like in an AI-first world. That’s what caught my attention with Nebius, a European-born cloud designed from the ground up for high-performance, AI-centric workloads. One that just closed a …

vLLM Production Stack on Amazon EKS with Terraform🧑🏼‍🚀

by CloudDudeAI, AWS, Devops, kubernetes, LLM, MultiCloud, terraform, Vllm November 18, 2025December 2, 2025Comments are Disabled

Intro Deploying vLLM manually is fine for a lab, but running it in production means dealing with Kubernetes, autoscaling, GPU orchestration, and observability. That’s where the vLLM Production Stack comes in – a Terraform-based blueprint that delivers production-ready LLM serving with enterprise-grade foundations. In this post, we’ll deploy it on Amazon EKS, covering everything from …

LLM Embeddings Explained Like I’m 5

by CloudDudeAI, LLM, Vllm October 21, 2025October 23, 2025Comments are Disabled

Intro We often hear about RAG (Retrieval-Augmented Generation) and vector databases that store embeddings, but we fail to remember what exactly are embeddings used for and how they work. In this post, we’ll break down how embeddings work – in the simplest way possible (yes, like you’re 5 🧠📎). I. What is an Embedding? Embeddings …

vLLM production-stack: LLM inference for Enterprises (part1)

by CloudDudeAI, LLM, Vllm September 23, 2025October 23, 2025Comments are Disabled

Intro If you’ve played with vLLM locally you already know how fast it can crank out tokens. But the minute you try to serve real traffic with multiple models, thousands of chats, you hit the same pain points the community kept reporting: ⚠️ Pain point What you really want High GPU bill Smarter routing + …

Latest Podcasts

Tag: VLLM

vLLM Production Stack on Azure AKS with Terraform🧑🏼‍🚀

Meet Nebius: The Cloud Built for the AI Era

vLLM Production Stack on Amazon EKS with Terraform🧑🏼‍🚀

LLM Embeddings Explained Like I’m 5

vLLM production-stack: LLM inference for Enterprises (part1)