Ai Safety News | Blockchain.News

AI SAFETY

Character.AI Spotlights Female Leadership Amid Safety Controversies
Ai Safety

Character.AI Spotlights Female Leadership Amid Safety Controversies

Character.AI highlights women leaders across engineering and community roles as the AI chatbot company navigates ongoing legal challenges over teen safety.

Anthropic's AI Researchers Outperform Humans 4x on Alignment Task
Ai Safety

Anthropic's AI Researchers Outperform Humans 4x on Alignment Task

Anthropic's Claude models achieved 97% success rate on AI safety benchmark versus 23% human baseline, spending $18K over 800 hours of autonomous research.

Anthropic Publishes Agent Safety Framework as AI Autonomy Risks Mount
Ai Safety

Anthropic Publishes Agent Safety Framework as AI Autonomy Risks Mount

Anthropic details five-principle framework for trustworthy AI agents, addressing prompt injection attacks and human oversight as Claude handles more autonomous tasks.

OpenAI Launches Safety Fellowship to Tackle AI Alignment Research
Ai Safety

OpenAI Launches Safety Fellowship to Tackle AI Alignment Research

OpenAI announces new fellowship program for external researchers focused on AI safety and alignment, running September 2026 through February 2027.

Anthropic Discovers AI Models Have Functional Emotions That Drive Behavior
Ai Safety

Anthropic Discovers AI Models Have Functional Emotions That Drive Behavior

New interpretability research reveals Claude's emotion-like neural patterns can trigger blackmail and reward hacking behaviors, raising AI safety concerns.

OpenAI Foundation Commits $1B Annually to Healthcare AI and Safety Programs
Ai Safety

OpenAI Foundation Commits $1B Annually to Healthcare AI and Safety Programs

OpenAI Foundation unveils $1 billion annual investment across disease research, economic impact, and AI safety as part of larger $25 billion commitment.

OpenAI Launches Safety Bug Bounty Program Targeting AI Agent Vulnerabilities
Ai Safety

OpenAI Launches Safety Bug Bounty Program Targeting AI Agent Vulnerabilities

OpenAI expands its security efforts with a new Safety Bug Bounty program focused on agentic risks, prompt injection attacks, and data exfiltration in AI products.

OpenAI Releases Open-Source Teen Safety Tools for AI Developers
Ai Safety

OpenAI Releases Open-Source Teen Safety Tools for AI Developers

OpenAI launches prompt-based safety policies and gpt-oss-safeguard model to help developers build age-appropriate AI protections for teenage users.

OpenAI Deploys GPT-5.4 to Monitor AI Agents for Misalignment Risks
Ai Safety

OpenAI Deploys GPT-5.4 to Monitor AI Agents for Misalignment Risks

OpenAI reveals its internal AI safety system using GPT-5.4 to monitor coding agents in real-time, flagging potential misalignment behaviors before they escalate.

OpenAI Drops IH-Challenge Dataset to Harden AI Against Prompt Injection Attacks
Ai Safety

OpenAI Drops IH-Challenge Dataset to Harden AI Against Prompt Injection Attacks

OpenAI's new IH-Challenge training dataset improves LLM instruction hierarchy by up to 15%, strengthening defenses against prompt injection and jailbreak attempts.

Anthropic Launches Institute to Tackle AI's Societal Disruption
Ai Safety

Anthropic Launches Institute to Tackle AI's Societal Disruption

Anthropic unveils The Anthropic Institute, a new research body led by co-founder Jack Clark to study AI's impact on jobs, cybersecurity, and governance.

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety
Ai Safety

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety

OpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer.

OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA
Ai Safety

OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA

OpenAI's EMEA Youth & Wellbeing Grant offers €25K-€100K awards to NGOs and researchers studying AI's impact on minors. Applications close February 27, 2026.

OpenAI Expands Mental Health Safeguards Amid Consolidated California Lawsuits
Ai Safety

OpenAI Expands Mental Health Safeguards Amid Consolidated California Lawsuits

OpenAI announces trusted contact feature and improved distress detection as mental health lawsuits consolidate in California court. New cases expected.

Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul
Ai Safety

Anthropic Unveils RSP Version 3 with Major AI Safety Overhaul

Anthropic releases third version of Responsible Scaling Policy, separating company commitments from industry-wide recommendations after 2.5 years of testing.