Ai Safety News | Blockchain.News

AI SAFETY

OpenAI Enhances GPT-5 for Sensitive Conversations with New Safety Measures
Ai Safety

OpenAI Enhances GPT-5 for Sensitive Conversations with New Safety Measures

OpenAI has released an addendum to the GPT-5 system card, showcasing improvements in handling sensitive conversations with enhanced safety benchmarks.

NVIDIA Introduces Safety Measures for Agentic AI Systems
Ai Safety

NVIDIA Introduces Safety Measures for Agentic AI Systems

NVIDIA has launched a comprehensive safety recipe to enhance the security and compliance of agentic AI systems, addressing risks such as prompt injection and data leakage.

NVIDIA NeMo Guardrails Enhance LLM Streaming for Safer AI Interactions
Ai Safety

NVIDIA NeMo Guardrails Enhance LLM Streaming for Safer AI Interactions

NVIDIA introduces NeMo Guardrails to enhance large language model (LLM) streaming, improving latency and safety for generative AI applications through real-time, token-by-token output validation.

Ensuring AI Reliability: NVIDIA NeMo Guardrails Integrates Cleanlab's Trustworthy Language Model
Ai Safety

Ensuring AI Reliability: NVIDIA NeMo Guardrails Integrates Cleanlab's Trustworthy Language Model

NVIDIA's NeMo Guardrails, in collaboration with Cleanlab's Trustworthy Language Model, aims to enhance AI reliability by preventing hallucinations in AI-generated responses.

OpenAI Releases Comprehensive GPT-4o System Card Detailing Safety Measures
Ai Safety

OpenAI Releases Comprehensive GPT-4o System Card Detailing Safety Measures

OpenAI's report on GPT-4o highlights extensive safety evaluations, red teaming, and risk mitigations prior to release.

Anthropic Expands AI Model Safety Bug Bounty Program
Ai Safety

Anthropic Expands AI Model Safety Bug Bounty Program

Anthropic broadens its AI model safety bug bounty program to address universal jailbreak vulnerabilities, offering rewards up to $15,000.

Anthropic Unveils Initiative to Enhance Third-Party AI Model Evaluations
Ai Safety

Anthropic Unveils Initiative to Enhance Third-Party AI Model Evaluations

Anthropic announces a new initiative aimed at funding third-party evaluations to better assess AI capabilities and risks, addressing the growing demand in the field.

Guaranteed Safe AI Systems: A Solution for the Future of AI Safety?
Ai Safety

Guaranteed Safe AI Systems: A Solution for the Future of AI Safety?

Exploring the potential of guaranteed safe AI systems in ensuring the safety and reliability of artificial general intelligence (AGI).

Exploring AGI Hallucination: A Comprehensive Survey of Challenges and Mitigation Strategies
Ai Safety

Exploring AGI Hallucination: A Comprehensive Survey of Challenges and Mitigation Strategies

A new survey delves into the phenomenon of AGI hallucination, categorizing its types, causes, and current mitigation approaches while discussing future research directions.

British Standards Institution Pioneers International AI Safety Guidelines for Sustainable Future
Ai Safety

British Standards Institution Pioneers International AI Safety Guidelines for Sustainable Future

BSI's release of the first international AI safety guideline, BS ISO/IEC 42001, marks a significant step in standardizing the safe and ethical use of AI, reflecting global demand for robust AI governance.

Exploring AI Stability: Navigating Non-Power-Seeking Behavior Across Environments
Ai Safety

Exploring AI Stability: Navigating Non-Power-Seeking Behavior Across Environments

The research explores AI's stability in non-power-seeking behaviors, revealing that certain policies maintain non-resistance to shutdown across similar environments, providing insights into mitigating risks associated with power-seeking AI.

Google DeepMind: Subtle Adversarial Image Manipulation Influences Both AI Model and Human Perception
Ai Safety

Google DeepMind: Subtle Adversarial Image Manipulation Influences Both AI Model and Human Perception

Recent DeepMind research reveals that subtle adversarial image manipulations, originally designed to deceive AI models, also subtly influence human perception. This discovery underscores similarities and distinctions in human and machine vision, emphasizing the need for further research in AI safety and security.

California Spearheads AI Ethics and Safety with Senate Bills 892 and 893
Ai Safety

California Spearheads AI Ethics and Safety with Senate Bills 892 and 893

California takes a pioneering role in AI regulation with Senate Bills 892 and 893, aiming to ensure AI safety, ethics, and public benefits.

NIST's Call for Public Input on AI Safety in Response to Biden's Executive Order
Ai Safety

NIST's Call for Public Input on AI Safety in Response to Biden's Executive Order

NIST is seeking public input to create AI safety guidelines following President Biden's Executive Order, aiming to ensure a secure AI environment, mitigate risks, and foster innovation.

OpenAI Introduces the "Preparedness Framework" for AI Safety and Policy Integration
Ai Safety

OpenAI Introduces the "Preparedness Framework" for AI Safety and Policy Integration

OpenAI has introduced the "Preparedness Framework," giving its board veto over CEO decisions and introducing risk scorecards for AI risk management, demonstrating its commitment to responsible AI development.