AI SAFETY
OpenAI Enhances GPT-5 for Sensitive Conversations with New Safety Measures
OpenAI has released an addendum to the GPT-5 system card, showcasing improvements in handling sensitive conversations with enhanced safety benchmarks.
NVIDIA Introduces Safety Measures for Agentic AI Systems
NVIDIA has launched a comprehensive safety recipe to enhance the security and compliance of agentic AI systems, addressing risks such as prompt injection and data leakage.
NVIDIA NeMo Guardrails Enhance LLM Streaming for Safer AI Interactions
NVIDIA introduces NeMo Guardrails to enhance large language model (LLM) streaming, improving latency and safety for generative AI applications through real-time, token-by-token output validation.
Ensuring AI Reliability: NVIDIA NeMo Guardrails Integrates Cleanlab's Trustworthy Language Model
NVIDIA's NeMo Guardrails, in collaboration with Cleanlab's Trustworthy Language Model, aims to enhance AI reliability by preventing hallucinations in AI-generated responses.
OpenAI Releases Comprehensive GPT-4o System Card Detailing Safety Measures
OpenAI's report on GPT-4o highlights extensive safety evaluations, red teaming, and risk mitigations prior to release.
Anthropic Expands AI Model Safety Bug Bounty Program
Anthropic broadens its AI model safety bug bounty program to address universal jailbreak vulnerabilities, offering rewards up to $15,000.
Anthropic Unveils Initiative to Enhance Third-Party AI Model Evaluations
Anthropic announces a new initiative aimed at funding third-party evaluations to better assess AI capabilities and risks, addressing the growing demand in the field.
Guaranteed Safe AI Systems: A Solution for the Future of AI Safety?
Exploring the potential of guaranteed safe AI systems in ensuring the safety and reliability of artificial general intelligence (AGI).
Exploring AGI Hallucination: A Comprehensive Survey of Challenges and Mitigation Strategies
A new survey delves into the phenomenon of AGI hallucination, categorizing its types, causes, and current mitigation approaches while discussing future research directions.
British Standards Institution Pioneers International AI Safety Guidelines for Sustainable Future
BSI's release of the first international AI safety guideline, BS ISO/IEC 42001, marks a significant step in standardizing the safe and ethical use of AI, reflecting global demand for robust AI governance.
Exploring AI Stability: Navigating Non-Power-Seeking Behavior Across Environments
The research explores AI's stability in non-power-seeking behaviors, revealing that certain policies maintain non-resistance to shutdown across similar environments, providing insights into mitigating risks associated with power-seeking AI.
Google DeepMind: Subtle Adversarial Image Manipulation Influences Both AI Model and Human Perception
Recent DeepMind research reveals that subtle adversarial image manipulations, originally designed to deceive AI models, also subtly influence human perception. This discovery underscores similarities and distinctions in human and machine vision, emphasizing the need for further research in AI safety and security.
California Spearheads AI Ethics and Safety with Senate Bills 892 and 893
California takes a pioneering role in AI regulation with Senate Bills 892 and 893, aiming to ensure AI safety, ethics, and public benefits.
NIST's Call for Public Input on AI Safety in Response to Biden's Executive Order
NIST is seeking public input to create AI safety guidelines following President Biden's Executive Order, aiming to ensure a secure AI environment, mitigate risks, and foster innovation.
OpenAI Introduces the "Preparedness Framework" for AI Safety and Policy Integration
OpenAI has introduced the "Preparedness Framework," giving its board veto over CEO decisions and introducing risk scorecards for AI risk management, demonstrating its commitment to responsible AI development.