AI & RoboticsNews

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened. The research, released today, tested 16 leading AI models in simulated corporate environments…
Read more
AI & RoboticsNews

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

Anthropic, the AI company founded by former OpenAI employees, has pulled back the curtain on an unprecedented analysis of how its AI assistant Claude expresses values during actual conversations with users. The research, released today, reveals both reassuring alignment with the company’s goals and concerning edge cases that could help identify vulnerabilities in AI safety measures. The study…
Read more