Model Compression - AI News

18k AI Model Beats Top Gamers via Distillation

2026-05-20 research 👁 13

Li Er's Agentank uses 18k-parameter distillation to rival top Lux AI players, proving small models can compete with LLMs…

2026-05-10 research 👁 11

NVIDIA releases Star Elastic, a post-training method embedding 30B, 23B, and 12B reasoning models in a single checkpoint…

2026-05-08 tutorial 👁 12

NVIDIA Model Optimizer streamlines post-training quantization, cutting VRAM usage by up to 75% while preserving model ac…

2026-05-07 research 👁 10

Seoul National University researchers develop a compact Vision Transformer that runs medical imaging diagnostics on smar…

2026-05-06 research 👁 11

South Korea's KAIST develops a novel pruning method that cuts Transformer model size by up to 60% while preserving over …

2026-05-05 tutorial 👁 10

A practical guide to reducing LLM inference costs by up to 80% using quantization and distillation techniques without sa…

2026-04-29 research 👁 11

A research team has proposed the AutoCompress method, discovering that Layer 0 in small Transformers carries over 60 tim…