โกDiffusion model caching: TeaCache
Intro If you’ve been following along, we’ve already covered vLLM-Omni and how diffusion models work. But here’s the dirty secret of diffusion models: they don’t run a single expensive computation, they run it many times per generation. 50 steps means 50 full forward passes through a multi-billion-parameter transformer. That’s a lot of GPU hours burned …