Microsoft launches open-source tool to assess AI performance

The tool simulates multistage cyberattack scenarios within a security operations centre environment hosted on Microsoft Azure.

October 15, 2025

Microsoft utilises ExCyTIn-Bench internally to evaluate AI-driven security features and pinpoint detection or workflow gaps in its models. Credit: nitpicker/Shutterstock.com.

Microsoft has introduced ExCyTIn-Bench, an open-source benchmarking tool developed to assess the performance of AI systems in cybersecurity investigations.

The tool simulates multistage cyberattack scenarios in a security operations centre (SOC) environment built on Microsoft Azure, using live queries across 57 log tables from Microsoft Sentinel and related services.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

Its methodology reflects the data volume and operational complexity that security teams encounter during real incidents.

Unlike earlier benchmarks that rely on static knowledge or multiple-choice questioning, ExCyTIn-Bench generates question-answer sets from incident graphs constructed by human analysts.

These bipartite alert-entity graphs allow for assessments grounded in authentic SOC data, requiring AI models to plan and execute investigative steps across multiple data sources.

The benchmark produces granular, stepwise feedback on each investigative action, moving beyond binary pass-fail grading.

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

Microsoft applies ExCyTIn-Bench internally to test AI-driven security features and identify detection or workflow gaps in its own models.

The company also uses it to evaluate integrations with Microsoft Security Copilot, Microsoft Sentinel, and Microsoft Defender, tracking both model performance and associated operational costs.

The framework aims to offer chief information security officers (CISO)s, IT leaders, and buyers a consistent means of comparing AI capabilities in security contexts.

By capturing how AI agents decompose investigative goals, interact with tools, and synthesise evidence, ExCyTIn-Bench addresses the limitations seen in benchmarks based on static evidence or trivia-style questioning.

Microsoft points out that even recent industry efforts such as CyberSOCEval do not fully capture the requirement for agents to interact with live, noisy data in a controlled SOC environment.

ExCyTIn-Bench is available as an open-source resource on GitHub, with Microsoft inviting participation from model developers and security teams.

The company indicated that future updates would include options for tailoring benchmarks to specific threat scenarios at the customer tenant level.

In September 2025, Microsoft integrated Anthropic’s Claude models into Copilot Studio, enhancing its existing support for OpenAI’s large language models.

The rollout has started for early release customers and will be available in preview across all environments within two weeks, with full production deployment anticipated by the end of 2025.

Microsoft launches open-source tool to assess AI performance

Go deeper with GlobalData

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Microsoft Corp. - Digital Transformation Strategies

Data Insights

Access deeper industry intelligence

US Tariffs are shifting - will you react or anticipate?

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Microsoft Corp. - Digital Transformation Strategies

Go deeper with GlobalData

US Iran war: Will helium shortages risk major disruption to quantum computing companies?

AI hyperscaler Nscale raises $2bn Series C at $14.6bn valuation

Microsoft and Anthropic partner to advance Copilot AI agents

ABB Robotics integrates NVIDIA Omniverse for industrial AI simulation

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Access deeper industry intelligence

US Tariffs are shifting - will you react or anticipate?

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Go deeper with GlobalData

Access deeper industry intelligence

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing