4 months ago
Thurs Sep 18, 2025 5:16am PST
Evals in 2025: going beyond simple benchmarks to build models people can use
read article
comments:
add comment
loading comments...