The Numbers
The vision jump is the headline number. Going from 54.5% to 98.5% visual acuity isn't incremental — it's a generation shift. Claude can now reliably read and interpret chemical structures, technical diagrams, dense tables, handwritten text, and low-contrast images that previous versions consistently failed on. If you've been avoiding Claude for any image-related workflow, it's worth retesting.
The coding improvements are similarly significant. 3× more production tasks resolved in real-world benchmarks — not toy examples — is the kind of gain that actually shows up in day-to-day use. The Finance Agent evaluation result (state-of-the-art) also points to strong performance on structured, multi-step reasoning with real data.
What's New in 4.7
high and max. For tasks that needed more than high but didn't justify max compute — now there's a middle ground with better cost efficiency.The Tokenizer Change: What It Means for You
This is the detail most people will miss — and it will affect your token counts and costs in production. Opus 4.7 uses a new tokenizer that produces 1.0–1.35× more tokens than 4.6 for the same input, depending on content type.
The new tokenizer enables better multilingual performance and more efficient handling of code syntax — but the tradeoff is higher token counts. Anthropic's pricing hasn't changed ($5/1M input, $25/1M output), so the cost impact comes from token volume, not price per token.
Instruction Following: More Literal, Needs Tuning
One behavioral shift worth flagging: Opus 4.7 follows instructions more literally than 4.6. This sounds positive — and mostly is — but it means prompts that relied on the model inferring intent or filling in gaps may behave differently.
Common scenarios where you may need to update prompts:
- Format instructions: If you said "be concise," 4.6 would infer a reasonable length. 4.7 takes it more literally — you may need to specify "under 150 words" if you want consistency.
- Role instructions: Vague personas ("be helpful and professional") may produce more restrained output. More specific tone guidance produces better results.
- Multi-step tasks: If your prompt described an outcome rather than steps, 4.7 may ask for clarification more often. Consider adding explicit steps or using the new task budget parameter.
Vision: What You Can Do Now That You Couldn't Before
The vision upgrade deserves its own section. Maximum image size jumped to 2,576px on the long edge (~3.75 megapixels), and the quality jump to 98.5% visual acuity opens up use cases that were simply unreliable on 4.6:
| Use Case | Opus 4.6 | Opus 4.7 |
|---|---|---|
| Standard screenshots, photos | Good | Excellent |
| Technical diagrams, flowcharts | Unreliable | Reliable |
| Chemical / molecular structures | Poor | Strong |
| Dense tables & spreadsheet exports | Misses data | Accurate |
| Handwritten text | Misreads frequently | Solid |
| Low-contrast or dark images | Struggles | Handles well |
| High-res product photos (3MP+) | Degrades | Supported |
For anyone building document processing pipelines, research tools, or medical/scientific applications — the vision upgrade alone may justify the migration even with the tokenizer overhead.
Long-Horizon Tasks: Stays on Track Longer
Anthropic specifically calls out improved long-horizon autonomy — the model's ability to work coherently on extended tasks without abandoning difficult problems or losing context. Combined with file-based memory for maintaining state, this makes Opus 4.7 significantly more capable for:
- Multi-step research and synthesis tasks
- Long agentic coding sessions (debugging across files, refactoring large codebases)
- Complex document analysis that requires cross-referencing
- Financial analysis workflows (state-of-the-art on Finance Agent eval)
Safety: Intentionally Reduced Cyber Capabilities
Anthropic made an unusual choice that's worth noting: Opus 4.7 has intentionally reduced cyber capabilities compared to 4.6. This is a deliberate safety decision, not a regression. For the vast majority of use cases, this has zero impact. For security researchers or red-team workflows, it's worth being aware of.
On honesty and prompt injection resistance, 4.7 scores better than 4.6. Deception rates and cooperation with misuse remain low. The one area that's modestly weaker: guidance around controlled substances, where the safety filters are slightly more conservative.
Availability & Pricing
| Detail | Claude Opus 4.7 |
|---|---|
| Model ID (API) | claude-opus-4-7 |
| Input price | $5 / 1M tokens |
| Output price | $25 / 1M tokens |
| Claude.ai | Available now (Pro & Team) |
| Anthropic API | Generally available |
| Amazon Bedrock | Available |
| Google Vertex AI | Available |
| Microsoft Foundry | Available |
Pricing is unchanged from Opus 4.6. The tokenizer increase means your effective per-request cost may be slightly higher depending on your content mix — but the price per token hasn't moved.
Should You Upgrade?
For most workloads: yes, and soon. The vision improvements and coding gains are substantial enough that staying on 4.6 for active production use is leaving performance on the table.
For cost-sensitive, high-volume text-only pipelines: audit first. The tokenizer change could increase costs meaningfully at scale. Run your typical prompts through the API, compare token counts, and model the cost impact before flipping the switch.
The literal instruction following is the only area requiring active attention. Build in a quick regression test with your most-used prompts — specifically checking format consistency and response length — before fully migrating.
☐ Update model ID to
claude-opus-4-7 in API calls
☐ Audit token counts on representative prompts (expect 0–35% increase)
☐ Test top 5 prompts for format and length regressions
☐ Add explicit length/format specs where you relied on implicit behavior
☐ Explore
xhigh effort level for complex tasks that previously needed max
Keep your Claude prompts organized as the model evolves
PromptChief lets you save prompt versions per model — so when Opus 4.8 drops, you're not starting from scratch again. Store, test, and iterate on your best prompts.
Try PromptChief Free →