メインコンテンツまでスキップ

swe-bench-sonnet

· 約1分

SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.

Each solution is graded against the real unit tests from the pull request that closed the original GitHub issue

https://www.swebench.com/