๐Ÿ”ดTechBeats live : LLM Quantization “vLLM vs. Llama.cpp”

๐Ÿ‘‹๐ŸผHey AI heads๐ŸŽ™๏ธ ๐‰๐จ๐ข๐ง ๐ฎ๐ฌ for the very first ๐“๐ž๐œ๐ก ๐๐ž๐š๐ญ๐ฌ ๐‹๐ข๐ฏ๐ž๐Ÿ”ด, hosted by Kosseilaโ€”aka @CloudDude , From @CloudThrill.
๐ŸŽฏ This chill & laid back livestream will unpack ๐‹๐‹๐Œ ๐ช๐ฎ๐š๐ง๐ญ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง๐Ÿ”ฅ:
โœ…๐–๐‡๐˜ it matters
โœ…๐‡๐Ž๐– it works
โœ… Enterprise (vllm) vs Consumer (@Ollama) tradeoffs
โœ… and ๐–๐‡๐„๐‘๐„ itโ€™s going next.

Weโ€™ll be joined by two incredible guest stars to talk about ๐„๐ง๐ญ๐ž๐ซ๐ฉ๐ซ๐ข๐ฌ๐ž ๐ฏ๐ฌ ๐‚๐จ๐ง๐ฌ๐ฎ๐ฆ๐ž๐ซ quantz ๐Ÿ—ฃ๏ธ:
๐Ÿ”ท ๐„๐ฅ๐๐š๐ซ ๐Š๐ฎ๐ซ๐ญ๐ข๐œฬ, bringing the enterprise perspective with vLLM.
๐Ÿ”ท๐‚๐จ๐ฅ๐ข๐ง ๐Š๐ž๐š๐ฅ๐ญ๐ฒ, aka Bartowski, top downloaded GGUF quant ๐‹๐‹๐Œ๐ฌ on Hugging Face.๐Ÿซต๐Ÿผ Come learn, and have some fun๐Ÿ˜Ž.
๐‚๐ก๐š๐ฉ๐ญ๐ž๐ซ๐ฌ :(00:00) Host Introduction
(04:07) Eldar Intro
(07:33) Bartowski Intro
(13:04) What’s Quantization!
(16:19) Why LLMs Quantization matters?
(20:39) Training Vs Inference “The new deal”
(27:46) Biggest misconception about quantization
(33:22) Enterprise Quantization in production (vLLM)
(48:48) Consumer LLMs and quantization (Ollama, llama.cpp, GGUF) “LLMs for the people”
(01:06:45) Bitnet 1Bit Quantization from Microsoft
(01:28:14) How long it takes to Quantize a model (llama3 70B) GGUF or lm–compressor
(01:34:23) What is I-Matrix, and why people confuse it with IQ Quantization ?
(01:39:36) What’s LoRA and LoRAQ
(01:42:36) What is Sparsity ?
(01:47:42) What is Distillation ?
(01:52:34) Extreme Quantization (Unsloth) of Big models (Deepseek) at 2bits with 70% size cut
(01:57:27) Will future models llama5 be trained on fp4 tensor cores ? if so why quantize it?
(02:02:15) The future of LLMs on edge Devices (Google AI edge)
(02:08:00) How to Evaluate the quality of Quantized model ?
(02:26:09) Hugging face Role in the world of LLM/quantization
(02:33:46) Hugging face Role in the world of LLM/quantization
(02:36:41) Localllama Sub-redit Down (Moderator goes banana) (02:40:11) Guests Hope for the Future of LLMs and AI in General Check out quantization
Blog : https://cloudthrill.ca/llm-quantizati
#AI #LLM #Quantization #TechBeatsLive #Locallama #VLLM #Ollama