Tool Use: The New LLM Benchmark
Evaluating LLMs now requires testing their ability to use external tools, shifting focus from pure text generation.
Latest articles in LLM News
Evaluating LLMs now requires testing their ability to use external tools, shifting focus from pure text generation.
Developers are adopting Chain-of-Thought prompting to significantly enhance the logical reasoning capabilities of large …
Chinese AI firm FaceWall releases 5 key edge models and tools in a week-long open-source event, challenging global stand…
A Microsoft slide leak reveals Anthropic's Claude Mythos used 6.1x10^27 FLOPs, matching Gemini 3.1 Pro and signaling a n…
Nemotron 3 Ultra introduces an open MoE hybrid Mamba-Transformer architecture designed to enhance agentic reasoning capa…
Struggling with GPT-5.5 skipping details? Learn how to stop model laziness and ensure accurate Excel data processing wit…
Google releases Gemma 4 12B under Apache 2.0, enabling local enterprise AI workflows and shifting the focus from cloud-o…
Users report GPT-5.5 suffers from verbose, redundant outputs compared to Claude Opus's concise precision.
Developers report GPT-5.5 suffers from verbose, repetitive outputs compared to Claude Opus's concise coding answers.
Google's Gemma 4 12B claims laptop readiness. We analyze how BF16 weights fit in 16GB VRAM via quantization and memory m…
Yunxuanzhang22 launches a stable AI API proxy with free trial credits. It offers competitive pricing for GPT and Claude …
Perplexity launches Search as Code, enabling AI agents to write Python search pipelines. This boosts benchmarks and slas…