The Numbers

+13%
improvement on coding benchmarks vs Opus 4.6
98.5%
visual acuity — up from 54.5% in Opus 4.6
−21%
fewer document reasoning errors
more production tasks resolved vs predecessor

The vision jump is the headline number. Going from 54.5% to 98.5% visual acuity isn't incremental — it's a generation shift. Claude can now reliably read and interpret chemical structures, technical diagrams, dense tables, handwritten text, and low-contrast images that previous versions consistently failed on. If you've been avoiding Claude for any image-related workflow, it's worth retesting.

The coding improvements are similarly significant. 3× more production tasks resolved in real-world benchmarks — not toy examples — is the kind of gain that actually shows up in day-to-day use. The Finance Agent evaluation result (state-of-the-art) also points to strong performance on structured, multi-step reasoning with real data.

What's New in 4.7

xhigh Effort Level
New setting between high and max. For tasks that needed more than high but didn't justify max compute — now there's a middle ground with better cost efficiency.
🎯
Task Budgets (Beta)
A new API parameter for token spend guidance. Tell the model how much compute to use before starting a task — useful for cost predictability in production pipelines.
🔍
/ultrareview in Claude Code
A dedicated code review command that triggers Opus 4.7's enhanced reasoning for thorough multi-file reviews. Particularly strong on architecture and security issues.
🗒️
File-Based Memory
Enhanced memory via file system notes. The model maintains coherence across long agentic sessions by writing and reading structured notes — less "forgetting" mid-task.

The Tokenizer Change: What It Means for You

This is the detail most people will miss — and it will affect your token counts and costs in production. Opus 4.7 uses a new tokenizer that produces 1.0–1.35× more tokens than 4.6 for the same input, depending on content type.

Heads up if you're using the API: The same text that cost X tokens on Opus 4.6 may cost up to 35% more tokens on Opus 4.7. For chat-heavy workloads, expect roughly 1.0–1.1× increase. For code, technical content, or non-English text, the increase could reach 1.35×. Audit your token budgets before upgrading.

The new tokenizer enables better multilingual performance and more efficient handling of code syntax — but the tradeoff is higher token counts. Anthropic's pricing hasn't changed ($5/1M input, $25/1M output), so the cost impact comes from token volume, not price per token.

Instruction Following: More Literal, Needs Tuning

One behavioral shift worth flagging: Opus 4.7 follows instructions more literally than 4.6. This sounds positive — and mostly is — but it means prompts that relied on the model inferring intent or filling in gaps may behave differently.

Common scenarios where you may need to update prompts:

Quick audit: Run your 5 most-used prompts through Opus 4.7 before fully migrating. Pay attention to response length, format adherence, and whether the model asks clarifying questions where 4.6 would have proceeded. These are the most common regression points.

Vision: What You Can Do Now That You Couldn't Before

The vision upgrade deserves its own section. Maximum image size jumped to 2,576px on the long edge (~3.75 megapixels), and the quality jump to 98.5% visual acuity opens up use cases that were simply unreliable on 4.6:

Use CaseOpus 4.6Opus 4.7
Standard screenshots, photosGoodExcellent
Technical diagrams, flowchartsUnreliableReliable
Chemical / molecular structuresPoorStrong
Dense tables & spreadsheet exportsMisses dataAccurate
Handwritten textMisreads frequentlySolid
Low-contrast or dark imagesStrugglesHandles well
High-res product photos (3MP+)DegradesSupported

For anyone building document processing pipelines, research tools, or medical/scientific applications — the vision upgrade alone may justify the migration even with the tokenizer overhead.

Long-Horizon Tasks: Stays on Track Longer

Anthropic specifically calls out improved long-horizon autonomy — the model's ability to work coherently on extended tasks without abandoning difficult problems or losing context. Combined with file-based memory for maintaining state, this makes Opus 4.7 significantly more capable for:

Safety: Intentionally Reduced Cyber Capabilities

Anthropic made an unusual choice that's worth noting: Opus 4.7 has intentionally reduced cyber capabilities compared to 4.6. This is a deliberate safety decision, not a regression. For the vast majority of use cases, this has zero impact. For security researchers or red-team workflows, it's worth being aware of.

On honesty and prompt injection resistance, 4.7 scores better than 4.6. Deception rates and cooperation with misuse remain low. The one area that's modestly weaker: guidance around controlled substances, where the safety filters are slightly more conservative.

Availability & Pricing

DetailClaude Opus 4.7
Model ID (API)claude-opus-4-7
Input price$5 / 1M tokens
Output price$25 / 1M tokens
Claude.aiAvailable now (Pro & Team)
Anthropic APIGenerally available
Amazon BedrockAvailable
Google Vertex AIAvailable
Microsoft FoundryAvailable

Pricing is unchanged from Opus 4.6. The tokenizer increase means your effective per-request cost may be slightly higher depending on your content mix — but the price per token hasn't moved.

Should You Upgrade?

For most workloads: yes, and soon. The vision improvements and coding gains are substantial enough that staying on 4.6 for active production use is leaving performance on the table.

For cost-sensitive, high-volume text-only pipelines: audit first. The tokenizer change could increase costs meaningfully at scale. Run your typical prompts through the API, compare token counts, and model the cost impact before flipping the switch.

The literal instruction following is the only area requiring active attention. Build in a quick regression test with your most-used prompts — specifically checking format consistency and response length — before fully migrating.

Migration checklist:
☐ Update model ID to claude-opus-4-7 in API calls
☐ Audit token counts on representative prompts (expect 0–35% increase)
☐ Test top 5 prompts for format and length regressions
☐ Add explicit length/format specs where you relied on implicit behavior
☐ Explore xhigh effort level for complex tasks that previously needed max

Keep your Claude prompts organized as the model evolves

PromptChief lets you save prompt versions per model — so when Opus 4.8 drops, you're not starting from scratch again. Store, test, and iterate on your best prompts.

Try PromptChief Free →