Compress LLMs with FP8, GPTQ & SmoothQuant
New tutorial demonstrates compressing instruction-tuned LLMs using llmcompressor. Compare FP8, GPTQ, and SmoothQuant for…
5 articles about 'quantization'
New tutorial demonstrates compressing instruction-tuned LLMs using llmcompressor. Compare FP8, GPTQ, and SmoothQuant for…
Apple's ML team reveals techniques to compress large language models below 1-bit precision, enabling powerful AI on iPho…
A practical guide to reducing LLM inference costs by up to 80% using quantization and distillation techniques without sa…
Microsoft Research introduces BitNet b2, pushing extreme quantization to slash LLM memory and compute costs while preser…
A practical guide to calculating exact GPU memory needs before deploying large language models locally.