LLAMA Archives - Cloudthrill

vLLM for beginners: Deployment Options (PartIII)

by CloudDudeAI, LLM, Vllm August 5, 2025October 23, 2025Comments are Disabled

Intro In Part 2 of our vLLM for beginners Series, we explored performance features like PagedAttention, attention backends, and prefill/decode optimization. In this final part, we’ll shift from theory to practice, covering how to deploy vLLM across different environments, from source builds to docker containers (K8s deployment will be covered separately). 💡In this series, we aim to provide …

vLLM for beginners: Key Features & Performance Optimization(PartII)

by CloudDudeAI, LLM, Vllm July 2, 2025October 23, 2025Comments are Disabled

Intro In Part 1 of our vLLM for beginners Series, we covered the fundamentals—core concepts and terminology behind vLLM’s architecture. In Part 2, we go deeper into what makes vLLM excel at performance: features like PagedAttention, attention backends, prefill & decode management, and more. 💡This series is about building a strong foundation in vLLM—understanding how …

vLLM for beginners: The Fundamentals

by CloudDudeAI, LLM, Vllm June 17, 2025October 23, 2025Comments are Disabled

Intro last year, I have dived deep into Ollama inference where I ended up building and speaking about Ollama Kubernetes deployments along with rich documentation in my ollama_lab repo and quantization article—This year’s Cloudtrhill focus is VLLM Inference which is a next level beast from a model serving standpoint. Exploring multiple inference options is time-intensive …

How to Quantize AI Models with Ollama CLI

by CloudDudeAI, LLM April 29, 2025October 23, 2025Comments are Disabled

Intro You’ve probably fired up ollama run some-cool-model tons of times, effortlessly pulling models from Ollama’s Repo or even directly from Hugging Face. But have you ever wondered how those CPU-friendly GGUF quantized models actually land on places like Hugging Face in the first place? What if I told you, you could contribute back with tools you might already be …

LLM Quantization: All You Need to Know!

by CloudDudeAI, LLM February 2, 2025October 23, 2025Comments are Disabled

Intro Over the past year, I was drowning into GitHub PRs, half-baked redit discussions, videos, and scattered docs trying to decode the chaos of quantization for Large Language Models (LLMs). Everyone was talking about running Llama models on a laptop, but no one was explaining how it actually worked—and forget about finding proper research papers …

Latest Podcasts

Tag: LLAMA

vLLM for beginners: Deployment Options (PartIII)

vLLM for beginners: Key Features & Performance Optimization(PartII)

vLLM for beginners: The Fundamentals

How to Quantize AI Models with Ollama CLI

LLM Quantization: All You Need to Know!