Research from Anthropic reveals vulnerability in LLMs

The AI startup found a concerning vulnerability across large language models in development.

New research from AI startup Anthropic has found weaknesses large language models (LLMs), that can override AI safety training.

A technique, known as many-shot jailbreaking, exploits a fault in inputting prompts containing faux dialogues (also known as shots) where an AI assistant complies with harmful requests.

A many-shot machine learning system requires a large dataset of examples to predict categories, as opposed to a one-shot system, which requires only one example to predict a category.

This jailbreaking technique (removing manufacturer-imposed restrictions on software, such as AI) relies upon the inherent weakness of machine learning: the vast training data the model requires.

The effectiveness of many-shot jailbreaking follows simple scaling laws as a function of the number of shots (faux dialogues). Learning from demonstrations, harmful or not, often follows the same power law scaling.

Increasing the context window of LLMs, that is, all the information AI can take in to process a request, enhances their usefulness but also makes them more vulnerable to adversarial attacks.

Anthropic’s research found that fine-tuning the model to refuse harmful queries could be a possible solution, but this only delays the jailbreak.

More successful methods involve prompt modification and classification to reduce the effectiveness of many-shot jailbreaking.

In a post on X (formerly Twitter), Anthropic said that it hopes sharing research on many-shot jailbreaking will accelerate progress towards mitigation strategies.

In August 2023, Britain’s National Cyber Security Centre identified two potential weak spots in LLMs that could be exploited by attackers: data poisoning attacks and prompt injection attacks.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData

Tick here to opt out of curated industry news, reports, and event updates from Verdict.

Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

Prompt injection attacks involve an input designed to cause the model to ‘generate offensive content, reveal confidential information, or trigger unintended consequences in a system.’

Research from Anthropic reveals vulnerability in LLMs

Go deeper with GlobalData

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

How well do you really know your competitors?

Thank you!

Not ready to buy yet? Download a free sample

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

OpenAI safety leader switches duties to work on "very important research project"

Who’s winning in the LLM wars?

GlobalData reveals top M&A advisers in tech sector for H1 2024

Anthropic helps Menlo Ventures launch $100m AI fund

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

How well do you really know your competitors?

Thank you!

Not ready to buy yet? Download a free sample

Sign up for our daily news round-up!

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing