18k AI Model Beats Top Gamers via Distillation
Li Er's Agentank uses 18k-parameter distillation to rival top Lux AI players, proving small models can compete with LLMs…
7 articles about 'Model Compression'
Li Er's Agentank uses 18k-parameter distillation to rival top Lux AI players, proving small models can compete with LLMs…
NVIDIA releases Star Elastic, a post-training method embedding 30B, 23B, and 12B reasoning models in a single checkpoint…
NVIDIA Model Optimizer streamlines post-training quantization, cutting VRAM usage by up to 75% while preserving model ac…
Seoul National University researchers develop a compact Vision Transformer that runs medical imaging diagnostics on smar…
South Korea's KAIST develops a novel pruning method that cuts Transformer model size by up to 60% while preserving over …
A practical guide to reducing LLM inference costs by up to 80% using quantization and distillation techniques without sa…
A research team has proposed the AutoCompress method, discovering that Layer 0 in small Transformers carries over 60 tim…