ProgramBench Tests If LLMs Can Rebuild Code
A new benchmark called ProgramBench challenges language models to reconstruct entire programs from specifications, revea…
2 articles about 'ProgramBench'
A new benchmark called ProgramBench challenges language models to reconstruct entire programs from specifications, revea…
Meta, Stanford, and Harvard launch ProgramBench, a brutal new benchmark that asks AI to build software from scratch. GPT…