swe-bench-sonnet

2025年7月19日 · 約1分

SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.

Each solution is graded against the real unit tests from the pull request that closed the original GitHub issue