Ai Safety News

Ai Safety

Anthropic Launches Institute to Tackle AI's Societal Disruption

Anthropic unveils The Anthropic Institute, a new research body led by co-founder Jack Clark to study AI's impact on jobs, cybersecurity, and governance.

by Ted Hisokawa
Mar 11, 2026

Ai Safety

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety

OpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer.

by Caroline Bishop
Mar 06, 2026

Ai Safety

OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA

OpenAI's EMEA Youth & Wellbeing Grant offers €25K-€100K awards to NGOs and researchers studying AI's impact on minors. Applications close February 27, 2026.

by Peter Zhang
Mar 05, 2026

Ai Safety

OpenAI Expands Mental Health Safeguards Amid Consolidated California Lawsuits

OpenAI announces trusted contact feature and improved distress detection as mental health lawsuits consolidate in California court. New cases expected.

by Felix Pinkston
Mar 04, 2026

Ai Safety

Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul

Anthropic releases third version of Responsible Scaling Policy, separating company commitments from industry-wide recommendations after 2.5 years of testing.

by Tony Kim
Feb 25, 2026

Ai Safety

Anthropic Study Reveals Users Skip Critical Checks on AI-Generated Code

New research from $380B-valued Anthropic shows users are 5.2% less likely to verify AI outputs when creating artifacts, raising questions about automation risks.

by Terrill Dicki
Feb 23, 2026

Ai Safety

Stability AI Joins Tech Coalition to Combat Child Exploitation

Stability AI becomes full member of Tech Coalition after completing 2025 Pathways program, strengthening AI safety measures against online child abuse.

by Tony Kim
Feb 12, 2026

Ai Safety

NVIDIA DRIVE AV Powers Mercedes-Benz CLA to Top Euro NCAP Safety Rating

Mercedes-Benz CLA earns Euro NCAP's Best Performer of 2025 award using NVIDIA DRIVE AV software, marking a shift toward AI-driven safety standards in vehicles.

by Alvin Lang
Jan 23, 2026

Ai Safety

Anthropic Releases Full AI Constitution for Claude Under Open License

Anthropic publishes Claude's complete training constitution under CC0 license, detailing AI safety priorities and ethical guidelines as company eyes $350B valuation.

by Joerg Hiller
Jan 22, 2026

Ai Safety

Anthropic Discovers 'Assistant Axis' to Prevent AI Jailbreaks and Persona Drift

Anthropic researchers map neural 'persona space' in LLMs, finding a key axis that controls AI character stability and blocks harmful behavior patterns.

by Caroline Bishop
Jan 20, 2026

Ai Safety

OpenAI Updates Model Spec with U18 Teen Safety Principles for ChatGPT

OpenAI introduces new U18 Principles to its Model Specification, establishing age-appropriate AI safety guidelines for teenage ChatGPT users ages 13-17.

by Terrill Dicki
Jan 17, 2026

Ai Safety

Anthropic Enhances AI Safeguards for Sensitive Conversations

Anthropic has implemented advanced safeguards for its AI, Claude, to better handle sensitive topics such as suicide and self-harm, ensuring user safety and well-being.

by Iris Coleman
Dec 19, 2025

Ai Safety

AI Development Framework Aims for Greater Transparency and Safety

Anthropic proposes a framework for AI transparency, focusing on safety and accountability. This initiative aims to enhance public safety and responsible AI development.

by James Ding
Nov 05, 2025

Ai Safety

Anthropic Strengthens AI Safeguards for Claude

Anthropic enhances its AI model Claude's safety and reliability with robust safeguards, ensuring beneficial outcomes while preventing misuse and harmful impacts.

by Peter Zhang
Oct 30, 2025

Ai Safety

Character.AI Implements New Safety Measures for Teen Users

Character.AI announces significant changes to enhance the safety of its platform for users under 18, including removing open-ended chat and introducing age assurance tools.

by Tony Kim
Oct 30, 2025

AI SAFETY