Multimodal Archives - Cloudthrill

vLLM-Omni on Nebius H100: Serving Z-Image, Wan2.2, Qwen3-TTS & NVIDIA Cosmos 3

by CloudDudeAI, LLM, MultiCloud, Multimodal, NeoClouds July 7, 2026June 21, 2026Comments are Disabled

Intro Conference demos always run out of clock, you show one or two outputs and the rest stay on the cutting-room floor. That’s what happened at my last Conf42 talk, How vLLM-Omni Unifies Multimodal Inference, so to make up for it I promised to share the whole code and demo videos in a proper blog. …

⚡Diffusion model caching: TeaCache

by CloudDudeAI, LLM, Multimodal May 20, 2026June 21, 2026Comments are Disabled

Intro If you’ve been following along, we’ve already covered vLLM-Omni and how diffusion models work. But here’s the dirty secret of diffusion models: they don’t run a single expensive computation, they run it many times per generation. 50 steps means 50 full forward passes through a multi-billion-parameter transformer. That’s a lot of GPU hours burned …

Diffusion Models explained: From Noise to Pixels

by CloudDudeAI, LLM, Multimodal, Vllm April 14, 2026June 26, 2026Comments are Disabled

Intro Today, most of us have used nano banana, Midjourney, Kling AI, Luma, or Sora to generate silly videos or catchy images on socials. But what do they share in common? They all rely on Diffusion Models as their core engine, even the brand new Seedance. While many of these are proprietary, the open-source world …

What is vLLM-Omni? Beginners Intro

by CloudDudeAI, LLM, Multimodal, Vllm March 24, 2026June 21, 2026Comments are Disabled

Intro Any-to-any multimodal models combining text, images, video, and audio are advancing AI, but their complex architectures, mixing autoregressive LLMs and diffusion transformers, make efficient serving very difficult. Current systems like OpenAI’s ChatGPT (text) and Sora (video) run as separate engines, lacking unified any-to-any pipelines. vLLM-Omni solves just that with a fully disaggregated serving system …

Latest Podcasts

Category: Multimodal

vLLM-Omni on Nebius H100: Serving Z-Image, Wan2.2, Qwen3-TTS & NVIDIA Cosmos 3

⚡Diffusion model caching: TeaCache

Diffusion Models explained: From Noise to Pixels

What is vLLM-Omni? Beginners Intro