vLLM Production Stack on CoreWeave CKS with Terraform๐Ÿง‘๐Ÿผโ€๐Ÿš€

Intro The vLLM Production Stack is designed to run on any Kubernetes-based infrastructure. After covering AWS , Azure, Google Cloud and Nebius MK8s implementations, today we’re deploying vLLM production-stack on CoreWeave Kubernetes (CKS) with the same Terraform framework. CoreWeave is one of the hottest NeoCould built on the idea that GenAI workloads donโ€™t need virtualization; they need direct access to …

CloudThrill Officially on Canadaโ€™s AI Supplier Source List

Intro CloudThrill has been accepted into Canadaโ€™s AI Supplier Source List under Band 1, becoming eligible to deliver Artificial Intelligence services, solutions, and products to the federal government. What is Canadaโ€™s AI Supplier Source List? The AI Supplier Source List is maintained by Public Services and Procurement Canada. It is a pre qualified roster of …

Diffusion Models explained: From Noise to Pixels

Intro Today, most of us have used nano banana, Midjourney, Kling AI, Luma, or Sora to generate silly videos or catchy images on socials. But what do they share in common? They all rely on Diffusion Models as their core engine, even the brand new Seedance. While many of these are proprietary, the open-source world …

NVIDIA’s ๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐— ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜† ๐—ฆ๐˜๐—ผ๐—ฟ๐—ฎ๐—ด๐—ฒ (CMX): The KV Cache War

Intro A few weeks ago I wrote about why Intel Optane Persistent Memory was the ideal technology for LLM KV-cache offloading with a near-DRAM latency, and natively non-volatile. In other words, it behaved like memory but survived reboots. I also explained why CXL wasn’t quite the performance equivalent, due to higher latency and non persistence. But recently …