Claude Mythos vs Opus: What Changed Between Tiers?

Claude Mythos isn’t just the next version of Opus — it’s the first model in an entirely new tier. Anthropic’s leaked internal documents describe Mythos as “larger and more intelligent than our Opus models,” with “dramatically higher scores” across coding, reasoning, and cybersecurity benchmarks. The Capybara tier sits above Opus in Anthropic’s four-level model hierarchy, and the performance gap between the two is not incremental — Dario Amodei calls it “a step change.”

Understanding Anthropic’s Model Hierarchy

Before March 2026, Anthropic operated a three-tier system: Haiku for fast lightweight tasks, Sonnet for the best balance of speed and intelligence, and Opus for maximum reasoning capability. The accidental leak of Claude Mythos revealed a fourth tier — Capybara — designed to sit above Opus as the highest-capability offering in Anthropic’s lineup.

The naming convention matters. Capybara is a tier name, not a model name. Just as “Opus” is a tier containing models like Claude Opus 4.5 and Claude Opus 4.6, Capybara is the tier and Claude Mythos is the first specific model within it. Future Capybara-tier models will likely follow the same pattern, with version numbers or new names released under the same tier classification.

Each tier represents a distinct tradeoff between speed, intelligence, and cost. Haiku processes requests fastest at the lowest price. Sonnet occupies the middle ground that Anthropic considers the best overall value. Opus delivers maximum reasoning at higher cost and lower speed. Capybara pushes the intelligence ceiling even higher, with the slowest response times and the highest price point in the lineup.

Tier	Speed	Reasoning	Input Cost	Output Cost	Status
Haiku 4.5	Fastest	Good	$0.80/M	$4.00/M	Public
Sonnet 4.6	Fast	Strong	$3.00/M	$15.00/M	Public
Opus 4.6	Moderate	Excellent	$15.00/M	$75.00/M	Public
Capybara (Mythos)	Slowest	Unprecedented	~$30-75/M est.	~$150-375/M est.	Invite-only

Benchmark Comparison: Mythos vs Opus 4.6

Coding Performance

Opus 4.6 scores 65.4% on Terminal-Bench 2.0, a benchmark that evaluates real-world terminal-based coding tasks across complex multi-step workflows. Leaked documents confirm that Mythos achieves “dramatically higher scores” on this same evaluation, though exact numbers haven’t surfaced publicly. For reference, Claude Sonnet 4 scores 72.7% on SWE-Bench Verified and Opus 4 reaches 72.5% — meaning the current Opus generation is already strong, and Mythos reportedly surpasses it by a meaningful margin.

The coding improvements go beyond raw benchmark numbers. Anthropic’s internal documents describe Mythos as capable of handling large-scale codebase refactoring, architectural planning, and multi-file modifications that trip up current Opus models. Where Opus 4.6 handles complex coding tasks well, Mythos reportedly handles tasks that Opus cannot complete at all.

Academic Reasoning

Opus 4.6 already performs at the top of the field on graduate-level reasoning benchmarks, scoring 83.3% on GPQA Diamond and 90.0% on AIME 2025 mathematical reasoning. Mythos’s leaked performance description uses the phrase “significantly improved” for academic reasoning, suggesting scores in ranges that would set new records on these evaluations.

The “deep connective tissue between ideas and knowledge” that Anthropic’s draft blog post describes points to improvements in cross-domain reasoning — the ability to apply insights from one field to solve problems in another. This represents a qualitative shift rather than just higher numbers on existing tests.

Cybersecurity Gap

The most dramatic difference between Mythos and Opus lies in cybersecurity. Opus has no specialized cybersecurity training or evaluation — it handles security tasks as well as any general-purpose model. Mythos, by contrast, was specifically designed and evaluated for vulnerability discovery, exploit chain construction, and attack surface analysis.

Leaked documents describe Mythos’s cybersecurity capabilities as “far ahead of any other AI model currently in use.” The model can reportedly identify zero-day vulnerabilities, construct working exploit chains, and analyze complex enterprise security architectures. This isn’t a marginal improvement over Opus — it’s an entirely new capability that Opus doesn’t possess.

The cybersecurity gap explains Anthropic’s release strategy. While Opus is available to anyone with an API key, Mythos is restricted to vetted cybersecurity defense organizations precisely because its offensive capabilities could cause real damage in the wrong hands.

Pricing: What Capybara Tier Will Cost

Current Opus Pricing

Claude Opus 4.6 costs $15 per million input tokens and $75 per million output tokens through Anthropic’s API. This makes it roughly 5x more expensive than Sonnet ($3/$15) and nearly 19x more expensive than Haiku ($0.80/$4) on input tokens. For a typical heavy-usage month generating 10 million output tokens, Opus costs approximately $750 — already positioning it as a premium offering.

Opus also requires more compute time per request, which means higher latency. A complex reasoning task that Sonnet processes in 5-10 seconds might take Opus 15-30 seconds. Users pay both the dollar cost and the time cost.

Expected Mythos Pricing

Anthropic’s leaked documents explicitly acknowledge that Mythos is “very expensive for us to serve, and will be very expensive for our customers to use.” The company is actively working to make the model more efficient before any general release. Capybara-tier pricing is expected to run 2-5x above Opus rates, putting the range at approximately $30-75 per million input tokens and $150-375 per million output tokens.

At the high end of that range, the same 10 million output tokens that cost $750 on Opus would cost $3,750 on Mythos. This pricing structure naturally limits the user base to organizations where the capability difference justifies a 2-5x cost increase — enterprise security teams, major research institutions, and organizations tackling problems that Opus genuinely cannot solve.

The efficiency improvements Anthropic is working on before general release could bring those costs down. Every previous tier in Anthropic’s lineup has seen price reductions over time as inference optimization improves. Expect Capybara pricing to start high and decrease within 6-12 months of public launch, following the same pattern as Opus.

API and Integration

Same API, Different Model Parameter

One of the practical advantages of staying within Anthropic’s ecosystem is API consistency. Switching from Opus to Mythos will require changing only the model parameter in your API calls — the endpoint, authentication, message format, and tool-use interface remain identical. If your application currently uses claude-opus-4-6, migrating to the Capybara-tier model will be as simple as updating that string.

This design is intentional. Anthropic structures its API so that model upgrades don’t require code refactoring. The same message array, system prompt format, and response structure work across all tiers. Token counting may differ (larger models sometimes tokenize differently), but the interface contract stays the same.

Migration Path

For teams currently running production workloads on Opus, the migration path involves three practical steps. First, audit your current Opus usage to understand which tasks would genuinely benefit from Capybara-tier capabilities — not every API call needs the most powerful model. Second, estimate the cost impact by multiplying your current Opus spend by 2-5x for the requests you plan to route to Mythos. Third, implement model routing logic that sends complex tasks to Mythos and routine tasks to Sonnet or Haiku.

The smartest strategy isn’t replacing Opus with Mythos everywhere — it’s using Mythos selectively for tasks where the capability gap matters. A typical production system might route 80% of requests to Sonnet, 15% to Opus, and 5% to Mythos, optimizing for both cost and quality.

When to Use Opus vs When to Wait for Mythos

Opus Still Makes Sense For

Opus 4.6 remains an excellent choice for complex coding tasks, research analysis, long-form writing, and multi-step reasoning that doesn’t require specialized cybersecurity capabilities. At $15/$75 per million tokens, Opus delivers frontier-level performance at a known, manageable price point. The model is publicly available, well-documented, and battle-tested across millions of production API calls.

For most development teams, Opus handles everything they need. Code review, architecture planning, debugging complex systems, generating technical documentation — these tasks don’t require Capybara-tier capabilities. Opus’s 83.3% on GPQA Diamond and 90.0% on AIME 2025 represent world-class reasoning that few tasks actually exhaust.

Mythos Is Worth Waiting For

If your work involves cybersecurity vulnerability assessment, large-scale penetration testing, or threat modeling against sophisticated adversaries, Mythos offers capabilities that no other model — including Opus — can match. The “step change” in cybersecurity capabilities is the primary reason to wait for Capybara-tier access.

Organizations tackling problems at the absolute frontier of AI capability — tasks where Opus produces partially correct or incomplete results — will benefit from Mythos. This includes extremely complex multi-step reasoning chains, novel research problems that require connecting insights across disparate domains, and enterprise-scale code refactoring projects involving millions of lines of code.

The expected public release window is Q3-Q4 2026, potentially coinciding with Anthropic’s anticipated IPO. Teams that need Mythos-level capabilities now should apply for early access through Anthropic’s cybersecurity partnership program.

Questions About Claude Mythos vs Opus

Is Claude Mythos better than Opus in every way?

In raw capability, yes — leaked benchmarks show Mythos scoring “dramatically higher” across coding, reasoning, and cybersecurity. The tradeoffs are price (2-5x more expensive), speed (slower inference), and availability (not publicly accessible as of March 2026).

What tier is Claude Mythos?

Mythos is the first model in the Capybara tier, Anthropic’s new fourth tier above Opus. The hierarchy from lowest to highest is: Haiku, Sonnet, Opus, Capybara.

Will Mythos replace Opus?

No. Opus will continue as a separate tier, just as Sonnet didn’t replace Haiku. Each tier serves a different cost-performance tradeoff. Opus remains the best option for users who need strong reasoning at a more accessible price point.

Can I use the same API for Mythos?

Yes. Anthropic’s API design means switching from Opus to Mythos requires only changing the model parameter. All endpoints, authentication, message formats, and tool-use interfaces remain identical.

How much more expensive is Mythos than Opus?

Expected pricing is 2-5x above Opus rates. Opus costs $15/M input and $75/M output tokens. Mythos is estimated at $30-75/M input and $150-375/M output tokens.

When can I access Claude Mythos?

As of March 2026, only invite-only cybersecurity organizations have access. Public release is expected in Q3-Q4 2026, pending ASL-4 safety evaluations and efficiency improvements.

Should I wait for Mythos or use Opus now?

Use Opus now for all general-purpose tasks — it’s publicly available and performs at frontier level. Wait for Mythos only if you specifically need its cybersecurity capabilities or face problems that genuinely exceed Opus’s reasoning limits.

What makes Mythos’s cybersecurity so much better?

Leaked documents describe Mythos as “far ahead of any other AI model in cyber capabilities,” with abilities to identify zero-day vulnerabilities, construct exploit chains, and analyze complex attack surfaces. Opus has no comparable specialization.