The danger lurks in AI: the spread of compromised language models

November 4, 2025

SophosAI is developing a new protection technology called ‘LLM salting’ that renders LLM jailbreaks useless.

The mass rollout of software is a problem if it is already compromised in its delivery state or in an update. This means that hundreds of thousands of companies may receive software with vulnerabilities or even implanted malware that can be exploited by cybercriminals.

The situation is similar with the use of artificial intelligence (AI) or large language models (LLMs). Companies are increasingly using LLMs such as OpenAI’s GPT series, Anthropic’s Claude, Meta’s LLaMA or various models from DeepSeek, adapting them slightly for their individual purposes. This widespread reuse leads to homogeneity of models across many applications – from chatbots to productivity tools. And it creates risks: jailbreak prompts, for example, which bypass AI rejection mechanisms and force it to do something it would not normally do, can be pre-calculated and then reused by cybercriminals in many deployments.

These jailbreaks are not a theoretical phenomenon, but a real security risk. They can be used to disclose sensitive internal data or even generate false, inappropriate or even harmful responses.

A pinch of salt makes all the difference

Protection against jailbreaks in LLMs is provided by new technology from SophosAI. Inspired by password salting – the concept of introducing small user-specific variations to prevent the reuse of pre-calculated inputs – SophosAI has developed a technique called ‘LLM salting’. This involves introducing targeted variations in model behaviour to render jailbreaks useless. For an area in the model activations that is responsible for denial behaviour, the security and AI experts have developed a fine-tuning process that rotates this area. This protection technology ensures that jailbreaks developed for unsalted models are no longer successful in salted models.

Tests certify LLM salting security against jailbreaks

In extensive testing, the SophosAI team of experts achieved convincing protection results with LLM salting. In an evaluation, the team tested 300 GCG jailbreak prompts on two different open source models and achieved an attack success rate (ASR) of 100% on unmodified LLM base models. In subsequent tests using the salting method, the team achieved an ASR of only 2.75 per cent and 1.35 per cent (depending on the LLM model).

LLM salting with the fine-tuning technique prevents the use of jailbreaks by rotating the internal rejection representations. This preserves the performance of the models for harmless inputs.

Future tests will extend salting to additional, larger models to evaluate their resilience to a broader range of jailbreaks.

Further technical information on ‘LLM salting’ can be found here: https://news.sophos.com/en-us/2025/10/24/locking-it-down-a-new-technique-to-prevent-llm-jailbreaks/

Related Articles

VdS certifies first mobile flood protection element

VdS certifies first mobile flood protection element

The mobile dyke has proven its strength: the mobile flood protection element from Mobildeich GmbH has performed impressively in practical tests conducted by VdS and is the first system ever to receive certification in accordance with VdS 3855 guidelines for flood...

Smiths Detection’s iCMORE receives R&D certification

Smiths Detection’s iCMORE receives R&D certification

The German Federal Police Research and Testing Centre (FuE) certifies Smiths Detection's proprietary iCMORE system for automated detection of prohibited items. Smiths Detection, a global leader in security and inspection technologies and a Smiths Group company,...

Share This