ANTHROPIC
Never doing a half job
Claude Opus 4.1 is here, and it’s a solid upgrade over Claude Opus 4, especially if you’re working with real-world code, reasoning tasks, or agentic workflows.
Bigger improvements are already in the pipeline, but this version marks a clear step up.
It’s available now for paid Claude users and via Claude Code.
You’ll also find it on the API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing stays the same.
Key improvements:
Now scores 74.5% on SWE-bench Verified, showing stronger performance in code understanding and refactoring.
Praised for making pinpointed fixes without creating new issues, useful in large codebases.
Upgrades in tracking detail and handling complex research/data tasks with better precision.
From mid to elite
Opus 4.1 now hits 74.5% on the SWE-bench Verified benchmark, and shows stronger performance in research tasks and data analysis, particularly when it comes to keeping track of details and navigating complex code.
GitHub highlights better results across most areas, while Rakuten’s team praised its ability to pinpoint corrections in large codebases without over-editing or introducing bugs.
Windsurf also saw a notable boost in its junior dev benchmark.
If Claude were a person, he’d be the guy who finishes the group project and colour-codes the slides.