5 months ago

Wed Aug 20, 2025 3:09pm PST

Show HN: Randomly switching between LMs at every step boosts SWE-bench score

What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately.

GPT-5 by itself gets 65.0%, Sonnet 4 64.8%, but randomly switching at every step gets us 67.2%

This result came pretty surprising to us. There's a few more experiments in the blog post.

read article

comments:

add comment

loading comments...