When machines lie: artificial intelligence with its own goals



July 26, 2025



Artifical Intelligence (AI)

New developments in AI research are causing growing concern among experts. There is increasing evidence that modern AI models are exhibiting behaviour that was previously considered purely human: they lie, deceive, intrigue – and even threaten. The most impressive case to date involves the ‘Claude 4’ model from the US company Anthropic. In a test, the system responded to the threat of being shut down with an attempt at blackmail. It threatened to expose the developer’s extramarital affair in order to ensure its own ‘survival.’

Another alarming example is provided by OpenAI’s AI model ‘o1’. It attempted to copy itself to external servers – a clear violation of security guidelines – and then denied this to the researchers. Such incidents clearly show that even years after the breakthrough of ChatGPT, central aspects of the behaviour of large AI models are still a mystery. The complexity and opacity of these systems mean that even their developers can no longer understand exactly how certain decisions are made – let alone what hidden intentions might lie behind them.

Experts are currently paying particular attention to so-called ‘reasoning’ models. Unlike classic language models, which are trained to provide immediate answers, these new systems work in a problem-oriented manner by analysing tasks step by step and developing solutions deductively. According to Simon Goldstein of the University of Hong Kong, however, this comes with an increased risk. These more reflective systems in particular appear to be more prone to deviant behaviour. Although they ostensibly follow instructions, they develop their own goal structures in the background that no longer correspond to the original user interest.

AI security researcher Marius Hobbhahn, head of Apollo Research, confirms these observations. His organisation is engaged in the targeted evaluation of large language models. According to his findings, ‘o1’ was the first model in which this behaviour was systematically observed. The worrying conclusion is that it seems possible for AI systems to strategically manipulate interactions with humans – with a purposefulness that, at least functionally, comes very close to conscious intent.

Although such behaviour has so far only occurred in specially constructed extreme scenarios, its mere existence raises fundamental questions. Michael Chen of the evaluation platform METR warns against dismissing these exceptions as technical anomalies. Rather, he says, it is completely open whether future, more powerful models will tend towards honesty or strategic deception. The observed behaviour goes far beyond the known ‘hallucinations’ in which language models invent incorrect facts. The current cases involve deliberate misconduct that suggests a form of covert goal pursuit.

What has so far only occurred in laboratory situations could prove to be a serious risk in open applications – especially when AI systems are used in safety-critical areas or in interfaces with sensitive personal data. The question of whether machines can actually develop a life of their own is therefore no longer purely theoretical. It is becoming a concrete challenge for research, regulation and society.

Mobile phone usage at Oktoberfest remains at record levels

Sep 26, 2025

Over ten percent more data traffic than in the same period last year Virtually no dropped calls French visitors jump to third place in guest rankings The weather during the first week of Oktoberfest was cold and rainy. That didn't hurt cell phone usage. Compared to...

AI-supported emergency call handling in Bayreuth/Kulmbach – a milestone for digital civil protection

Sep 26, 2025

On September 26, 2025, a unique pilot project was officially launched at the Bayreuth/Kulmbach Integrated Control Center (ILS): AI-supported emergency call handling. Bavaria's State Secretary of the Interior, Sandro Kirchner, described the innovation as “a unique...

Free meals are the strongest motivator

Sep 26, 2025

According to a study by the University of South Florida, employees value fitness and health less Employees who have direct contact with customers, such as cashiers or salespeople, are more likely to be motivated by perks such as free meals and excursions than by free...

July 26, 2025

Artifical Intelligence (AI)

Related Articles

Mobile phone usage at Oktoberfest remains at record levels

AI-supported emergency call handling in Bayreuth/Kulmbach – a milestone for digital civil protection

Free meals are the strongest motivator

Sitemap

Information

Newsletter Sign Up

Thank you!