Can AI Find Zero-Day Vulnerabilities? Yes — And It’s Already Happening
AI systems are already finding zero-day vulnerabilities at a pace and scale that human security researchers cannot match. In the DARPA AI Cyber Challenge (AIxCC) final round in 2025, seven AI teams processed 54 million lines of code, successfully patched 43 of 54 synthetic vulnerabilities, and uncovered 18 previously unknown real-world flaws. Meanwhile, Anthropic’s Mythos‘s leaked Claude Mythos model — the first in the new Capybara tier — is described as “currently far ahead of any other AI model in cyber capabilities,” with internal testing showing it finds vulnerabilities faster than human teams.

How AI Discovers Zero-Day Vulnerabilities
Behavioral Analytics and Anomaly Detection
Traditional vulnerability scanners rely on known signatures — they match code patterns against databases of previously identified flaws. AI takes a fundamentally different approach. Machine learning models monitor application and network behavior to identify deviations from normal activity, enabling detection of brand-new exploits without any predefined signatures. Supervised and unsupervised models continuously refine their understanding of what “normal” looks like, flagging anomalies that could indicate a zero-day exploit in progress.
Deep learning neural networks enhance endpoint detection and response (EDR) by recognizing subtle signs of malicious activity that signature-based tools miss entirely. Natural language processing scans threat intelligence reports and security advisories to identify emerging vulnerability patterns before they become widely known.
AI-Powered Fuzzing and Code Analysis
The most productive approach combines AI with automated fuzzing — systematically feeding malformed inputs to software to trigger unexpected behavior. AI-driven fuzzers don’t just generate random inputs; they learn which input patterns are most likely to expose vulnerabilities based on code structure, past discoveries, and program execution paths.
Static code analysis powered by large language models reads source code the way a human researcher would. The AI examines past fixes to find similar bugs that weren’t addressed, spots patterns that historically cause problems, and understands program logic well enough to predict exactly what input would break it. This is how systems like AISLE and Trend Micro’s AESIR achieve their results.
Machine Learning Pattern Recognition
AI vulnerability discovery also leverages transfer learning across codebases. A model trained on vulnerabilities in one library can recognize structurally similar flaws in completely different software. This cross-pollination effect means that every vulnerability found makes the AI better at finding the next one, creating a compounding advantage over time that human researchers cannot replicate at the same scale.
Real-World Results: AI Finding Zero-Days in 2025-2026
DARPA AIxCC Competition Results
The DARPA Artificial Intelligence Cyber Challenge (AIxCC) provided the most rigorous public test of AI vulnerability discovery to date. Seven finalist teams built cyber reasoning systems (CRSs) incorporating cutting-edge large language models to automatically find and patch vulnerabilities in open-source software.
The results were striking. In the final competition, AI systems processed 54 million lines of code and found 18 previously unknown real-world vulnerabilities. Team Atlanta’s CRS won the competition and the $2 million grand prize. The average cost per competition task was just $152, with remediation completed in an average of 45 minutes. Compared to the industry standard of weeks and thousands of dollars per vulnerability assessment, this represents orders-of-magnitude improvement in both speed and cost.
All seven finalist teams released their CRSs as open-source software, meaning these AI vulnerability discovery tools are now available for the broader cybersecurity community to use and improve.
AISLE Discovers 12 OpenSSL Zero-Days
Perhaps the most impressive single demonstration came from AISLE, an AI security research system. In early 2026, AISLE discovered all 12 of the zero-day vulnerabilities that OpenSSL announced — a perfect 12-for-12 score. This followed AISLE’s earlier work in October 2025, when it found three new OpenSSL vulnerabilities that were previously unknown to human researchers.
The timing is notable: curl, the widely-used data transfer library, cancelled its bug bounty program around the same period, partly because AI-generated vulnerability reports were overwhelming human reviewers. AISLE’s results suggest that the quality gap between AI and human vulnerability discovery is closing rapidly — and in some domains, AI has already pulled ahead.
Trend Micro AESIR Platform
Trend Micro’s AESIR (AI-Enhanced Security Intelligence and Research) platform has been systematically discovering critical vulnerabilities since mid-2025. The platform has uncovered 21 critical CVEs across industry-leading platforms including NVIDIA, Tencent, MLflow, and MCP tooling. AESIR combines multiple AI techniques — static analysis, dynamic testing, and pattern recognition — to find vulnerabilities that escaped years of manual code review.
Claude Mythos and Cybersecurity
Anthropic’s Capybara Tier Model
Claude Mythos represents a new class of AI capability in cybersecurity. Leaked on March 27, 2026, through a CMS misconfiguration that exposed approximately 3,000 unpublished assets, Mythos is the first model in Anthropic’s new Capybara tier — a level above Opus in the hierarchy of Haiku, Sonnet, Opus, and now Capybara.
Anthropic describes Mythos as achieving “dramatically higher scores” than Claude Opus 4.6 on tests of software coding, academic reasoning, and cybersecurity. Internal documents characterize it as a “step change” in capability rather than an incremental improvement.
Vulnerability Discovery Capabilities
What makes Claude Mythos particularly significant for zero-day discovery is its reported superiority in cybersecurity tasks. According to Fortune’s reporting on the leaked materials, Anthropic’s own testing showed Mythos finding vulnerabilities faster than human security teams. The model is described as “currently far ahead of any other AI model in cyber capabilities.”
Leaked internal documents warn that the model could “significantly heighten cybersecurity risks by rapidly finding and exploiting software vulnerabilities, potentially accelerating a cyber arms race.” This dual-use concern — the same capabilities that help defenders find and patch vulnerabilities also help attackers discover and exploit them — is why Anthropic is proceeding with extreme caution.
Restricted Testing with Cyber Defense Organizations
Mythos is not publicly available. Anthropic is conducting a cautious rollout, starting with select cyber defense organizations tasked with evaluating the model’s security applications. The goal is to help organizations “improve the robustness of their codebases” before broader access could enable malicious use.
This approach mirrors the responsible disclosure practices common in traditional security research, applied to AI model deployment. Defenders get a head start before the model — or models with similar capabilities from other companies — become widely accessible.
AI vs Human Vulnerability Researchers
Speed and Scale Advantages
The numbers tell the story. In the DARPA AIxCC, AI systems found vulnerabilities across 54 million lines of code in a competition timeframe. A human security researcher typically reviews 100-200 lines of code per hour during manual auditing. At that rate, reviewing the same codebase would take one researcher roughly 30-60 years of full-time work.
AI doesn’t just work faster — it works differently. It can hold entire codebases in context, track data flow across thousands of functions simultaneously, and test millions of input combinations in the time a human researcher tests dozens.
Cost Comparison
The DARPA AIxCC average of $152 per vulnerability task stands in stark contrast to industry norms. A manual penetration test typically costs $10,000 to $100,000 depending on scope, and a dedicated vulnerability research engagement can run into the hundreds of thousands. Even accounting for the computational cost of running large AI models, the economics are shifting dramatically in favor of AI-assisted discovery.
Where Humans Still Lead
AI vulnerability discovery is not without limitations. False positives remain a significant challenge — overly sensitive AI systems can flag benign code patterns as potential vulnerabilities, creating noise that wastes human review time. AI systems also struggle with novel attack vectors that differ fundamentally from their training data, and they lack the intuitive understanding of business logic that experienced security researchers bring.
The most effective approach in 2026 combines AI speed with human judgment. AI handles the broad scanning and pattern matching, while human researchers focus on validating findings, understanding exploit chains, and assessing real-world impact.
The Risks of AI-Powered Vulnerability Discovery
Offensive vs Defensive Use
Every AI capability that helps defenders find and patch vulnerabilities also helps attackers discover and exploit them. Google’s Threat Intelligence Group assessed that AI will “accelerate the ongoing race between attackers and defenders” through 2026 and beyond. The question is not whether AI will be used offensively — it already is — but whether defenders can maintain enough of an advantage to protect critical systems.
The DARPA AIxCC open-source releases deliberately aimed to shift the balance toward defenders by making powerful vulnerability discovery tools freely available. But attackers also benefit from these same tools and techniques.
The Cyber Arms Race
Anthropic’s own leaked documents acknowledge this tension explicitly. Claude Mythos “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.” The window between vulnerability discovery and patch deployment — traditionally measured in days or weeks — may shrink to hours or minutes when AI systems on both sides are operating at machine speed.
This compression of the vulnerability lifecycle fundamentally changes the economics of cybersecurity. Organizations that cannot deploy patches at AI speed will face exponentially increasing risk from AI-discovered zero-days.
Responsible Disclosure Challenges
The traditional responsible disclosure process — where researchers privately notify vendors and give them time to patch before publishing — faces new pressure from AI. When an AI system can find a vulnerability, there’s no guarantee that only one AI system will find it. Multiple AI systems, some operated by attackers, may independently discover the same flaw within days or hours of each other.
The DARPA AIxCC addressed this by following the Linux Foundation’s vulnerability disclosure best practices for any real zero-days discovered during the competition. But as AI vulnerability discovery becomes more widespread, the security community needs new frameworks for handling the increased volume and speed of discoveries.
Questions About AI and Zero-Day Vulnerabilities
Can AI detect zero-day vulnerabilities before they are exploited?
Yes. AI systems use behavioral analytics, machine learning anomaly detection, and AI-powered code analysis to identify previously unknown vulnerabilities. The DARPA AIxCC demonstrated that AI can find real zero-days in production code, discovering 18 previously unknown flaws across 54 million lines of code.
How does Claude Mythos compare to other AI models at finding vulnerabilities?
Anthropic describes Claude Mythos as “currently far ahead of any other AI model in cyber capabilities.” Internal testing reportedly showed Mythos finding vulnerabilities faster than human security teams, with dramatically higher scores on cybersecurity benchmarks compared to Claude Opus 4.6.
What was the DARPA AIxCC and what did it prove?
The DARPA AI Cyber Challenge was a two-year competition where seven teams built AI cyber reasoning systems to find and patch vulnerabilities in open-source software. The final round in 2025 proved AI can find real zero-days at an average cost of $152 per task and 45 minutes per remediation — orders of magnitude cheaper and faster than human methods.
How did AISLE find 12 OpenSSL zero-days?
AISLE, an AI security research system, discovered all 12 zero-day vulnerabilities that OpenSSL announced in early 2026. The system uses AI-powered code analysis and pattern recognition to identify flaws that human researchers missed. AISLE first demonstrated its capabilities in October 2025 by finding three new OpenSSL vulnerabilities.
Is AI better than humans at finding zero-day vulnerabilities?
AI excels at speed and scale — processing millions of lines of code and testing millions of inputs in timeframes impossible for humans. However, humans still lead in understanding business logic, novel attack vectors, and assessing real-world impact. The most effective approach combines both: AI for broad scanning and humans for validation.
What is the risk of AI being used to exploit zero-days?
The same AI capabilities that help defenders also help attackers. Google’s Threat Intelligence Group warns that AI will accelerate both offense and defense through 2026 and beyond. Anthropic’s leaked documents acknowledge that Claude Mythos “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace defenders.”
What AI tools can find zero-day vulnerabilities?
Notable tools include the open-source cyber reasoning systems from DARPA AIxCC (including Team Atlanta’s winning CRS), AISLE for library vulnerability discovery, Trend Micro’s AESIR platform (21 critical CVEs found), and Microsoft Security Copilot for threat detection. Claude Mythos is being tested in restricted settings for vulnerability discovery.
How much does AI vulnerability scanning cost compared to manual testing?
AI-powered vulnerability discovery averaged $152 per task in the DARPA AIxCC, with 45-minute remediation times. Manual penetration tests typically cost $10,000-$100,000 and take days to weeks. Even accounting for AI compute costs, automated discovery is roughly 100x cheaper than traditional methods.
