NVIDIA’s ๐๐ผ๐ป๐๐ฒ๐ ๐ ๐ ๐ฒ๐บ๐ผ๐ฟ๐ ๐ฆ๐๐ผ๐ฟ๐ฎ๐ด๐ฒ (CMX): The KV Cache War
Intro A few weeks ago I wrote about why Intel Optane Persistent Memory was the ideal technology for LLM KV-cache offloading with a near-DRAM latency, and natively non-volatile. In other words, it behaved like memory but survived reboots. I also explained why CXL wasn’t quite the performance equivalent, due to higher latency and non persistence. But recently …