swe-bench-sonnet
· 約1分
SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.
Each solution is graded against the real unit tests from the pull request that closed the original GitHub issue