quantization - AI News

Compress LLMs with FP8, GPTQ & SmoothQuant

2026-05-18 tutorial 👁 11

New tutorial demonstrates compressing instruction-tuned LLMs using llmcompressor. Compare FP8, GPTQ, and SmoothQuant for…

2026-05-06 research 👁 9

Apple's ML team reveals techniques to compress large language models below 1-bit precision, enabling powerful AI on iPho…

2026-05-05 tutorial 👁 10

A practical guide to reducing LLM inference costs by up to 80% using quantization and distillation techniques without sa…

2026-05-05 research 👁 8

Microsoft Research introduces BitNet b2, pushing extreme quantization to slash LLM memory and compute costs while preser…

2026-05-03 tutorial 👁 11

A practical guide to calculating exact GPU memory needs before deploying large language models locally.