Prompt injections remain the Achilles’ heel of autonomous AI agents

June 12, 2026

Security researchers demonstrate an attack on OpenClaw via manipulated message objects

As autonomous AI agents become increasingly widespread, the attack surface for cybercriminals is also growing. Whilst traditional IT systems have been protected by established security mechanisms for years, many AI-based assistants are still in the early stages of their security development. Recent research findings from Thales now show how AI agents can be compromised via manipulated message objects – even when the actual commands are barely visible to human users.

The investigation focused on the open-source AI agent OpenClaw. The security researchers were able to demonstrate that prompt injection attacks can be introduced via various metadata fields and hidden content. The findings were responsibly reported to the OpenClaw security team, which implemented appropriate countermeasures in version 2026.4.23.

However, the significance of the investigation extends far beyond a single product. According to the researchers, this is a fundamental problem with modern AI agents.

When message objects become attack tools

Prompt injections are now considered one of the greatest challenges facing Large Language Models (LLMs). This involves embedding instructions within content that is processed by the model and is intended to influence its behaviour.

In the scenario investigated, the researchers did not use traditional text inputs, but rather complex message objects as an attack vector. These included, for example, contact information, location data or embedded metadata. Particularly critical: the malicious instructions can be designed in such a way that they are barely perceptible to human users.

One example is hidden text within images, designed to be almost identical in colour to the background. Such content remains largely invisible to the human eye. However, a multimodal AI system can read this information and interpret it as regular input text.

If such an object is subsequently passed on to an AI agent, the agent may, under certain circumstances, process the hidden instructions as legitimate commands.

AI agents pose significantly greater risks than chatbots

The research highlights a crucial difference between traditional chatbots and modern agent systems.

Whilst a chatbot typically only generates responses, AI agents increasingly possess far-reaching permissions. They can read files, access external services, control applications or execute shell commands. As a result, a successful prompt injection becomes not only a problem of information integrity, but potentially a direct security incident.

The researchers point out that personal AI assistants frequently simplify complex message objects and convert their contents into a prompt. It is precisely this translation process that opens up new avenues for attack.

Lack of standards exacerbates the problem

The current lack of standardisation in the handling of message objects for AI systems appears particularly problematic.

Whilst standards such as the Model Context Protocol (MCP) are increasingly becoming established for the integration of external tools, there are as yet no generally accepted guidelines on how message objects should be serialised and passed to language models.

As a result, many providers implement their own methods for processing metadata, attachments or contact information. Security mechanisms are implemented inconsistently, which can create new attack vectors.

The researchers view this as a structural vulnerability across the entire industry, rather than merely a problem with individual products.

OpenClaw responds with security update

In response to the report, the OpenClaw team released a security fix. This involved removing certain metadata fields – including contact names, vCard information and location names – from the actual prompt context and instead moving them to a separate channel for untrusted information.

This is intended to prevent potentially manipulated content from being interpreted directly as instructions.

However, the researchers emphasise that similar patterns have also been observed in other AI assistants. The underlying risk therefore remains.

Why prompt injections remain unresolved

Prompt injections differ fundamentally from traditional software vulnerabilities. Whilst a faulty section of code can usually be clearly identified and corrected, the problem with language models stems from their very mode of operation.

Models are designed to interpret natural language and derive instructions from inputs. It is precisely this capability that makes them vulnerable to manipulation attempts.

To date, there is no generally accepted protection mechanism that can reliably prevent prompt injections. Many security measures reduce the risk, but do not eliminate the problem entirely.

Security architecture becomes a decisive factor

The study underscores the need for additional protective measures when deploying autonomous AI agents.

Key recommendations include:

  • Consistent use of sandbox environments
  • Implementation of the least-privilege principle for agent tools
  • Separation of agent systems and sensitive data sets
  • Verification of the origin and integrity of incoming content
  • Restriction of critical permissions to the absolute minimum

Precisely because AI agents are increasingly taking on operational tasks, securing them is becoming a key challenge for security managers.

Conclusion

The attacks on OpenClaw highlighted by Thales illustrate a fundamental challenge facing the next generation of AI systems. The more AI agents evolve from mere chatbots into capable digital assistants, the greater the potential impact of successful manipulations.

Prompt injections are less a vulnerability of individual products and more a structural problem of current LLM architectures. The study shows that even seemingly harmless message objects or hidden content can serve as attack vectors.

For businesses, this means that AI agents should not be viewed as ordinary software components. Rather, they must be treated as privileged systems, whose permissions, data access and communication channels are consistently secured. The security of autonomous AI is thus increasingly becoming a core task of modern cybersecurity strategies.

Related Articles

Germany establishes centre of excellence for AI security

New institute to assess risks of modern AI systems and help shape international standards The German government is stepping up its activities in the field of artificial intelligence and establishing a new body to assess the opportunities and risks of modern AI...

Share This