Search Results for "ai safety"
OpenAI Updates Model Spec with U18 Teen Safety Principles for ChatGPT
OpenAI introduces new U18 Principles to its Model Specification, establishing age-appropriate AI safety guidelines for teenage ChatGPT users ages 13-17.
Anthropic Discovers 'Assistant Axis' to Prevent AI Jailbreaks and Persona Drift
Anthropic researchers map neural 'persona space' in LLMs, finding a key axis that controls AI character stability and blocks harmful behavior patterns.
Anthropic Releases Full AI Constitution for Claude Under Open License
Anthropic publishes Claude's complete training constitution under CC0 license, detailing AI safety priorities and ethical guidelines as company eyes $350B valuation.
NVIDIA DRIVE AV Powers Mercedes-Benz CLA to Top Euro NCAP Safety Rating
Mercedes-Benz CLA earns Euro NCAP's Best Performer of 2025 award using NVIDIA DRIVE AV software, marking a shift toward AI-driven safety standards in vehicles.
Stability AI Joins Tech Coalition to Combat Child Exploitation
Stability AI becomes full member of Tech Coalition after completing 2025 Pathways program, strengthening AI safety measures against online child abuse.
Anthropic Study Reveals Users Skip Critical Checks on AI-Generated Code
New research from $380B-valued Anthropic shows users are 5.2% less likely to verify AI outputs when creating artifacts, raising questions about automation risks.
Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul
Anthropic releases third version of Responsible Scaling Policy, separating company commitments from industry-wide recommendations after 2.5 years of testing.
OpenAI Expands Mental Health Safeguards Amid Consolidated California Lawsuits
OpenAI announces trusted contact feature and improved distress detection as mental health lawsuits consolidate in California court. New cases expected.
OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA
OpenAI's EMEA Youth & Wellbeing Grant offers €25K-€100K awards to NGOs and researchers studying AI's impact on minors. Applications close February 27, 2026.
OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety
OpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer.
Anthropic Launches Institute to Tackle AI's Societal Disruption
Anthropic unveils The Anthropic Institute, a new research body led by co-founder Jack Clark to study AI's impact on jobs, cybersecurity, and governance.
OpenAI Drops IH-Challenge Dataset to Harden AI Against Prompt Injection Attacks
OpenAI's new IH-Challenge training dataset improves LLM instruction hierarchy by up to 15%, strengthening defenses against prompt injection and jailbreak attempts.
