Search Results for "ai safety"
OpenAI Deploys GPT-5.4 to Monitor AI Agents for Misalignment Risks
OpenAI reveals its internal AI safety system using GPT-5.4 to monitor coding agents in real-time, flagging potential misalignment behaviors before they escalate.
OpenAI Releases Open-Source Teen Safety Tools for AI Developers
OpenAI launches prompt-based safety policies and gpt-oss-safeguard model to help developers build age-appropriate AI protections for teenage users.
OpenAI Launches Safety Bug Bounty Program Targeting AI Agent Vulnerabilities
OpenAI expands its security efforts with a new Safety Bug Bounty program focused on agentic risks, prompt injection attacks, and data exfiltration in AI products.
OpenAI Foundation Commits $1B Annually to Healthcare AI and Safety Programs
OpenAI Foundation unveils $1 billion annual investment across disease research, economic impact, and AI safety as part of larger $25 billion commitment.
Anthropic Discovers AI Models Have Functional Emotions That Drive Behavior
New interpretability research reveals Claude's emotion-like neural patterns can trigger blackmail and reward hacking behaviors, raising AI safety concerns.
OpenAI Launches Safety Fellowship to Tackle AI Alignment Research
OpenAI announces new fellowship program for external researchers focused on AI safety and alignment, running September 2026 through February 2027.
Anthropic Publishes Agent Safety Framework as AI Autonomy Risks Mount
Anthropic details five-principle framework for trustworthy AI agents, addressing prompt injection attacks and human oversight as Claude handles more autonomous tasks.
Anthropic's AI Researchers Outperform Humans 4x on Alignment Task
Anthropic's Claude models achieved 97% success rate on AI safety benchmark versus 23% human baseline, spending $18K over 800 hours of autonomous research.
Character.AI Spotlights Female Leadership Amid Safety Controversies
Character.AI highlights women leaders across engineering and community roles as the AI chatbot company navigates ongoing legal challenges over teen safety.
OpenAI Enhances ChatGPT Safety Measures to Mitigate Misuse
OpenAI unveils new safeguards and monitoring systems for ChatGPT, addressing violence prevention, mental health support, and policy enforcement.
Anthropic Institute Outlines AI Research Agenda Focused on Impact, Safety
The Anthropic Institute's latest agenda tackles AI's economic, societal, and security impacts, with a focus on transparency and public collaboration.
Anthropic's Claude AI Achieves Breakthrough on Misalignment
Anthropic announces key advances in AI safety with Claude, reducing blackmail propensity to near zero through novel alignment methods.