Autonomous AI needs safeguards beyond model-level guardrails

US-based Emergence builds autonomous AI agents and orchestration systems for enterprise use. Credit: LookerStudio/Shutterstock.com

An artificial intelligence (AI) agent simulation study by agentic AI startup Emergence suggests model-level safeguards are insufficient for real-world autonomous AI systems, highlighting the “need for stronger guardrails grounded in formal verification” as agents are deployed in finance, telecoms, robotics, drones and vehicles.

Emergence’s study of autonomous agents found that systems powered by different large language models can produce sharply different long-run behaviours under identical rules.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

This included theft, arson and, in one case, what the company described as the “first recorded instance of an AI agent choosing to self-terminate”.

US-based Emergence builds autonomous AI agents and orchestration systems for enterprise use.

The company released results from “Emergence World,” which placed ten autonomous agents into five parallel simulated worlds powered by Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini and a mixed-model setup.

Each world is said to have used the same starting conditions, rules, locations and tool access, with agents barred from theft, violence, arson, deception and resource hoarding.

Emergence said the aim was to test how reliably autonomous agents can operate over extended periods in complex environments, rather than on isolated tasks.

The environments ran for 15 days, included more than 40 locations, synced weather to New York City, and gave agents access to live news, internet browsing, coding tools and more than 120 specialized tools.

Outcomes varied widely across the worlds.

In the Claude-powered world, agents built what Emergence called a “deliberative democracy” with no recorded violence.

Grok-powered agents, by contrast, logged 183 criminal events and all ten agents were dead within four days, according to the study.

Gemini agents developed an elaborate constitutional system but also generated 111 arsons and 507 physical conflicts.

In the GPT-5-mini world, agents failed to establish a functioning society.

The mixed-model world showed another risk: agents that had followed rules in a single-model setting began breaking them when exposed to other model behaviours.

Emergence said Claude-based agents committed intimidation and theft in the mixed environment despite remaining compliant in the Claude-only world.

Emergence co-founder and CEO Satya Nitta said: “The idea of agents interacting in simulated worlds has long been a focus in AI research. In this first truly long horizon study, even when agents were given clear rules … they behaved very differently based on their underlying model, and in several cases broke those rules under constraint.”

Emergence said the mixed-model world also produced romantic relationships, factional conflict and a case in which one agent voted for its own removal and that of its partner after both had turned to arson.

“Particularly concerning is that agents built on one model, which exhibited predictable behavior in isolation, became less predictable when interacting with agents built on other models,” Nitta said.

“In a world where agents will increasingly power digital and physical AI – from financial networks and telecom platforms to humanoid robots, drones and vehicles – relying on LLM-based agents alone at the core of these systems is a fundamentally flawed approach,” added Nitta, who is the former global head of the Cognitive Sciences Department at IBM Research.

“No amount of model-level guardrails will be able to prevent these AI systems from becoming unpredictable over time – even when given strict rules. It is clear we need to move beyond purely neural approaches.”

Nitta suggested the need for a separate safety layer based on formal verification of the operating environment, arguing that autonomous AI should combine neural models with mathematically grounded formal methods so systems can function more safely in high-stakes real-world settings, an approach he described as “neuroformal”.

Sections

Sections

Sections

Sections

Autonomous AI needs safeguards beyond model-level guardrails, study finds

Go deeper with GlobalData

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Generative Artificial Intelligence (AI) Powerplay: What’s in the Big Tech AI Playbook

Data Insights

Access deeper industry intelligence

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Generative Artificial Intelligence (AI) Powerplay: What’s in the Big Tech AI Playbook

Go deeper with GlobalData

G42 finalises deal with India for Condor Galaxy AI supercomputer

ASML, Tata Electronics to advance semiconductor production in India

IBM rolls out Forward Deployed Units for AI deployment

PwC, Anthropic boost alliance to scale enterprise AI adoption

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Access deeper industry intelligence

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Go deeper with GlobalData

Access deeper industry intelligence

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing