Claude Opus 4.7: Everything New — Benchmarks, Vision, Coding & What Changed

Anthropic released Claude Opus 4.7 — generally available now on Claude.ai, the API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. It's a meaningful upgrade over 4.6: dramatically better vision, 13% stronger coding, a new effort level, and a new tokenizer that will affect your token counts. Here's everything that matters.

The Numbers

+13%
improvement on coding benchmarks vs Opus 4.6

98.5%
visual acuity — up from 54.5% in Opus 4.6

−21%

fewer document reasoning errors

3×

more production tasks resolved vs predecessor

The vision jump is the headline number. Going from 54.5% to 98.5% visual acuity isn't incremental — it's a generation shift. Claude can now reliably read and interpret chemical structures, technical diagrams, dense tables, handwritten text, and low-contrast images that previous versions consistently failed on. If you've been avoiding Claude for any image-related workflow, it's worth retesting.

The coding improvements are similarly significant. 3× more production tasks resolved in real-world benchmarks — not toy examples — is the kind of gain that actually shows up in day-to-day use. The Finance Agent evaluation result (state-of-the-art) also points to strong performance on structured, multi-step reasoning with real data.

What's New in 4.7

⚡

xhigh Effort Level

New setting between high and max. For tasks that needed more than high but didn't justify max compute — now there's a middle ground with better cost efficiency.

🎯

Task Budgets (Beta)

A new API parameter for token spend guidance. Tell the model how much compute to use before starting a task — useful for cost predictability in production pipelines.

🔍

/ultrareview in Claude Code

A dedicated code review command that triggers Opus 4.7's enhanced reasoning for thorough multi-file reviews. Particularly strong on architecture and security issues.

🗒️

File-Based Memory

Enhanced memory via file system notes. The model maintains coherence across long agentic sessions by writing and reading structured notes — less "forgetting" mid-task.

The Tokenizer Change: What It Means for You

This is the detail most people will miss — and it will affect your token counts and costs in production. Opus 4.7 uses a new tokenizer that produces 1.0–1.35× more tokens than 4.6 for the same input, depending on content type.

Heads up if you're using the API: The same text that cost X tokens on Opus 4.6 may cost up to 35% more tokens on Opus 4.7. For chat-heavy workloads, expect roughly 1.0–1.1× increase. For code, technical content, or non-English text, the increase could reach 1.35×. Audit your token budgets before upgrading.

The new tokenizer enables better multilingual performance and more efficient handling of code syntax — but the tradeoff is higher token counts. Anthropic's pricing hasn't changed ($5/1M input, $25/1M output), so the cost impact comes from token volume, not price per token.

Instruction Following: More Literal, Needs Tuning

One behavioral shift worth flagging: Opus 4.7 follows instructions more literally than 4.6. This sounds positive — and mostly is — but it means prompts that relied on the model inferring intent or filling in gaps may behave differently.

Common scenarios where you may need to update prompts:

Format instructions: If you said "be concise," 4.6 would infer a reasonable length. 4.7 takes it more literally — you may need to specify "under 150 words" if you want consistency.
Role instructions: Vague personas ("be helpful and professional") may produce more restrained output. More specific tone guidance produces better results.
Multi-step tasks: If your prompt described an outcome rather than steps, 4.7 may ask for clarification more often. Consider adding explicit steps or using the new task budget parameter.

Quick audit: Run your 5 most-used prompts through Opus 4.7 before fully migrating. Pay attention to response length, format adherence, and whether the model asks clarifying questions where 4.6 would have proceeded. These are the most common regression points.

Vision: What You Can Do Now That You Couldn't Before

The vision upgrade deserves its own section. Maximum image size jumped to 2,576px on the long edge (~3.75 megapixels), and the quality jump to 98.5% visual acuity opens up use cases that were simply unreliable on 4.6:

Use Case	Opus 4.6	Opus 4.7
Standard screenshots, photos	Good	Excellent
Technical diagrams, flowcharts	Unreliable	Reliable
Chemical / molecular structures	Poor	Strong
Dense tables & spreadsheet exports	Misses data	Accurate
Handwritten text	Misreads frequently	Solid
Low-contrast or dark images	Struggles	Handles well
High-res product photos (3MP+)	Degrades	Supported

For anyone building document processing pipelines, research tools, or medical/scientific applications — the vision upgrade alone may justify the migration even with the tokenizer overhead.

Long-Horizon Tasks: Stays on Track Longer

Anthropic specifically calls out improved long-horizon autonomy — the model's ability to work coherently on extended tasks without abandoning difficult problems or losing context. Combined with file-based memory for maintaining state, this makes Opus 4.7 significantly more capable for:

Multi-step research and synthesis tasks
Long agentic coding sessions (debugging across files, refactoring large codebases)
Complex document analysis that requires cross-referencing
Financial analysis workflows (state-of-the-art on Finance Agent eval)

Safety: Intentionally Reduced Cyber Capabilities

Anthropic made an unusual choice that's worth noting: Opus 4.7 has intentionally reduced cyber capabilities compared to 4.6. This is a deliberate safety decision, not a regression. For the vast majority of use cases, this has zero impact. For security researchers or red-team workflows, it's worth being aware of.

On honesty and prompt injection resistance, 4.7 scores better than 4.6. Deception rates and cooperation with misuse remain low. The one area that's modestly weaker: guidance around controlled substances, where the safety filters are slightly more conservative.

Availability & Pricing

Detail	Claude Opus 4.7
Model ID (API)	`claude-opus-4-7`
Input price	$5 / 1M tokens
Output price	$25 / 1M tokens
Claude.ai	Available now (Pro & Team)
Anthropic API	Generally available
Amazon Bedrock	Available
Google Vertex AI	Available
Microsoft Foundry	Available

Pricing is unchanged from Opus 4.6. The tokenizer increase means your effective per-request cost may be slightly higher depending on your content mix — but the price per token hasn't moved.

Should You Upgrade?

For most workloads: yes, and soon. The vision improvements and coding gains are substantial enough that staying on 4.6 for active production use is leaving performance on the table.

For cost-sensitive, high-volume text-only pipelines: audit first. The tokenizer change could increase costs meaningfully at scale. Run your typical prompts through the API, compare token counts, and model the cost impact before flipping the switch.

The literal instruction following is the only area requiring active attention. Build in a quick regression test with your most-used prompts — specifically checking format consistency and response length — before fully migrating.

Migration checklist:
☐ Update model ID to claude-opus-4-7 in API calls
☐ Audit token counts on representative prompts (expect 0–35% increase)
☐ Test top 5 prompts for format and length regressions
☐ Add explicit length/format specs where you relied on implicit behavior
☐ Explore xhigh effort level for complex tasks that previously needed max

Keep your Claude prompts organized as the model evolves

PromptChief lets you save prompt versions per model — so when Opus 4.8 drops, you're not starting from scratch again. Store, test, and iterate on your best prompts.

Try PromptChief Free →

Claude Opus 4.7 is Here: Benchmarks, New Features & What You Need to Change