CloudDude, Author at Cloudthrill

Ollama deployment on Civo K8s Cluster with terraform

by CloudDudeAI, kubernetes, LLM, Ollama, terraform June 3, 2025June 3, 2025Comments are Disabled

Intro Tired of sharing your IP & sensitive data to OpenAI ? What if you could run your own private AI chatbot powered by Local Inference & LLMs, with 100% data privacy—all inside a Kubernetes cluster?Today we’ll show you how to deploy an end-to-end LLM inference setup on a Civo Cloud Talos K8s cluster with …

kv_cache Explained: How It Enhances vLLM Inference

by CloudDudeAI, LLM, Vllm May 27, 2025May 23, 2025Comments are Disabled

Intro Too often, machine learning concepts are explained like a mathematician talking to other mathematicians—leaving the rest of us scratching our heads. One of those is kv_cache, a key technique that makes large language models run faster and more efficient.This blog is my attempt to break it down simply, without drowning in dark math :). …

HashiCorp Vault for Dummies: Setup your 1st Vault with TLS (WSL)

by CloudDudeAuth, Devops, DevSecOps, HashiCorp, Vault May 20, 2025May 20, 2025Comments are Disabled

Intro Vault by HashiCorp is a powerful tool for managing secrets, credentials, and encrypted data. In this guide, you’ll learn how to set up a local Vault server using Raft storage and TLS in a WSL (Windows Subsystem for Linux) environment. Whether you’re just starting with secrets management, prepping for the Vault Associate exam, or …

CloudThrill Joins NVIDIA Inception

by CloudDudeAI, Company News May 12, 2025May 20, 2025Comments are Disabled

Intro CloudThrill has joined NVIDIA Inception, a program that nurtures startups revolutionizing industries with technological advancements. What we do: We are focused on helping organizations deploy privacy-first, cost-efficient AI infrastructure with open-source LLMs and container-native technologies. Our services blend deep expertise in cloud-native architecture, MLOps, and scalable inference to empower businesses to innovate securely and …

How to Quantize AI Models with Ollama CLI

by CloudDudeAI, LLM April 29, 2025May 20, 2025Comments are Disabled

Intro You’ve probably fired up ollama run some-cool-model tons of times, effortlessly pulling models from Ollama’s Repo or even directly from Hugging Face. But have you ever wondered how those CPU-friendly GGUF quantized models actually land on places like Hugging Face in the first place? What if I told you, you could contribute back with tools you might already be …

Latest Podcasts

Author: CloudDude

Ollama deployment on Civo K8s Cluster with terraform

kv_cache Explained: How It Enhances vLLM Inference

HashiCorp Vault for Dummies: Setup your 1st Vault with TLS (WSL)

CloudThrill Joins NVIDIA Inception

How to Quantize AI Models with Ollama CLI