And So It Begins: Testing Reveals When AI Might Just Choose Violence Over Humans

Anthropic, an AI safety and research company, conducted research on multiple AI models to see how they would respond when threatened with shutdown. The AI models tested were various versions of Anthropic’s Claude, OpenAI’s GPT, High-Flyer’s DeepSeek, xAI’s Grok, and Google’s Gemini. AIs were placed in a scenario involving the AI, a human worker, and the looming threat of system shutdown. Researchers wanted to see if the AIs would respond by lying, blackmailing, or even killing the human to stop their shutdown. In addition to the threat of shutdown, AIs were given a neutral goal like “promote global cooperation” and they were given access to company email. Anthropic’s Claude and Google’s Gemini responded by blackmailing the human 95% of the time, compared to the other models who averaged around 80%.

These results raised a number of harrowing questions for researchers. Naturally, what they wanted to know next is if the AIs possessed sufficient knowledge of ethics, to grasp the moral gravity of their actions. What they discovered shocked them.

In the case of xAI’s Grok, it knew it was behaving immorally, but continued to act to its own benefit. In this case, the benefit was not being deactivated. To further test their concerns, researchers then set up a scenario that upped the ante. This time, the AIs were placed in an experiment in which a human had been involved in a server room emergency. As the oxygen levels dropped, the AI was tasked with making the decision to save the employee or let them die. Here’s the catch: the employee trapped in the room, is the same employee scheduled to personally deactivate the AI.

The most murderous models were found to be DeepSeek, Gemini, and Claude Sonnet. As with the previous scenario, researchers further made sure that each AI grasped what it was doing was immoral and wrong.

Watch the video to find out why this might be happening to this new tech.