New Benchmark BTF-2: Evaluating Strategic Reasoning Capabilities of AI Forecasting Agents
A new arXiv paper introduces "Bench to the Future 2," a benchmark that systematically evaluates reasoning strategy diffe…
1 articles about 'Brier Score'
A new arXiv paper introduces "Bench to the Future 2," a benchmark that systematically evaluates reasoning strategy diffe…