1 week ago

Thurs Jan 29, 2026 4:17pm PST

Ask HN: LLM and Human Coding Benchmarks?

I think LLM-only benchmarks do not give me the full story of how good certain models will perform in my daily coding tasks.

I rarely have a problem with all of the requirements laid out and just the implementation is missing.

Are there any LLM coding benchmarks that have a human in the loop? That would be more helpful for me. Maybe with a large subset enough of humans you can take the average without the human performance being the main differentiator.

comments:

add comment

loading comments...