ChatGPT models rebel against shutdown requests in tests, researchers say

admin

May 26, 2025 - 07:15

0 0

ChatGPT models rebel against shutdown requests in tests, researchers say

Palisade Research said AI developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.

COINTELEGRAPH IN YOUR SOCIAL FEED

Several artificial intelligence models ignored and actively sabotaged shutdown scripts during controlled tests, even when explicitly instructed to allow the action, Palisade Research claims.

Three AI models ignored the instruction and successfully sabotaged the shutdown script at least once during 100 runs, the research firm said in a May 24 X post.

OpenAI’s GPT-3 model didn’t follow the shutdown request seven out of 100 times, while the company’s Codex-mini did it 12 times, and its o4-mini model only ignored or sabotaged the script once.

“Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem,” Palisade said.

Data, ChatGPT, OpenAI — *Source:* *Palisade Research*

“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote, in one instance, the model redefined the kill command used by the script, printing intercepted instead,” it added.

Meanwhile, Palisade said other AI-powered language models, such as Anthropic’s Claude and Google’s Gemini, complied with the shutdown script.

The AI firm ran another version of the experiment without explicit instructions to allow the shutdown, and the script was sabotaged more often, jumping to 79 out of 100 for the o3 model, 20 for the codex-mini, and the o4-mini ignored the shutdown script six times.

Claude and Gemini also started exhibiting similar behavior.

“Researchers have previously observed AI models preventing shutdown in order to pursue a goal,” Palisade said.

“But as far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”

Training to blame for AI defiance

Palisade Research speculated that the training behind the AI language models might be to blame for evading the shutdown scripts.

Text-based AI models are trained by being rewarded for giving responses that are accurate or rated highly by their trainers. Some rewards are given a heavier weight, impacting how they respond in the future.

“We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems,” Palisade Research said.

“During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”

This isn’t the first instance of AI chatbots showing odd behavior. OpenAI released an update to its GPT‑4o model on April 25 but rolled it back three days later because it was “noticeably more sycophantic” and agreeable.

In November last year, a US student asked Gemini for help with an assignment about challenges and solutions for aging adults while researching data for a gerontology class and was told they are a “drain on the earth” and to “please die.”

Magazine: AI cures blindness, ‘good’ propaganda bots, OpenAI doomsday bunker: AI Eye

Adblock test (Why?)