The danger lurks in AI: the spread of compromised language models



November 4, 2025



Artifical Intelligence (AI)

SophosAI is developing a new protection technology called ‘LLM salting’ that renders LLM jailbreaks useless.

The mass rollout of software is a problem if it is already compromised in its delivery state or in an update. This means that hundreds of thousands of companies may receive software with vulnerabilities or even implanted malware that can be exploited by cybercriminals.

The situation is similar with the use of artificial intelligence (AI) or large language models (LLMs). Companies are increasingly using LLMs such as OpenAI’s GPT series, Anthropic’s Claude, Meta’s LLaMA or various models from DeepSeek, adapting them slightly for their individual purposes. This widespread reuse leads to homogeneity of models across many applications – from chatbots to productivity tools. And it creates risks: jailbreak prompts, for example, which bypass AI rejection mechanisms and force it to do something it would not normally do, can be pre-calculated and then reused by cybercriminals in many deployments.

These jailbreaks are not a theoretical phenomenon, but a real security risk. They can be used to disclose sensitive internal data or even generate false, inappropriate or even harmful responses.

A pinch of salt makes all the difference

Protection against jailbreaks in LLMs is provided by new technology from SophosAI. Inspired by password salting – the concept of introducing small user-specific variations to prevent the reuse of pre-calculated inputs – SophosAI has developed a technique called ‘LLM salting’. This involves introducing targeted variations in model behaviour to render jailbreaks useless. For an area in the model activations that is responsible for denial behaviour, the security and AI experts have developed a fine-tuning process that rotates this area. This protection technology ensures that jailbreaks developed for unsalted models are no longer successful in salted models.

Tests certify LLM salting security against jailbreaks

In extensive testing, the SophosAI team of experts achieved convincing protection results with LLM salting. In an evaluation, the team tested 300 GCG jailbreak prompts on two different open source models and achieved an attack success rate (ASR) of 100% on unmodified LLM base models. In subsequent tests using the salting method, the team achieved an ASR of only 2.75 per cent and 1.35 per cent (depending on the LLM model).

LLM salting with the fine-tuning technique prevents the use of jailbreaks by rotating the internal rejection representations. This preserves the performance of the models for harmless inputs.

Future tests will extend salting to additional, larger models to evaluate their resilience to a broader range of jailbreaks.

Further technical information on ‘LLM salting’ can be found here: https://news.sophos.com/en-us/2025/10/24/locking-it-down-a-new-technique-to-prevent-llm-jailbreaks/

Focus on the importance of cooperation and innovation

Nov 24, 2025

Herrmann at the Security and Innovation Forum at Friedrich-Alexander University Erlangen-Nuremberg At the Security and Innovation Forum at Friedrich-Alexander University Erlangen-Nuremberg (FAU) on Monday, Bavaria's Interior Minister Joachim Herrmann emphasised the...

Airbus’ OneSat selected for Oman’s first satellite

Nov 24, 2025

Space Communication Technologies (SCT), Oman's national satellite operator, has awarded Airbus Defence and Space a contract for OmanSat-1, a state-of-the-art, fully reconfigurable, high-throughput OneSat telecommunications satellite, including the associated system....

Black Friday: Half go bargain hunting

Nov 24, 2025

On average, 312 euros are spent – around 11 per cent more than last year Online shops from China polarise opinion: half avoid them, the other half have already ordered from them Four out of ten young people would send AI shopping on its own When Black Friday and the...

November 4, 2025

Artifical Intelligence (AI)

SophosAI is developing a new protection technology called ‘LLM salting’ that renders LLM jailbreaks useless.

A pinch of salt makes all the difference

Tests certify LLM salting security against jailbreaks

Related Articles

Focus on the importance of cooperation and innovation

Airbus’ OneSat selected for Oman’s first satellite

Black Friday: Half go bargain hunting

Sitemap

Information

Newsletter Sign Up

Thank you!