multimodal AI News List | Blockchain.News
AI News List

List of AI News about multimodal

Time Details
2026-05-22
17:22
Gemini Omni Redefines video editing with multimodal power

According to Ethan Mollick, Gemini Omni natively edits video via full multimodality, transforming the 1896 train film into multiple styled variants.

Source
2026-05-22
11:50
SenseNova U1 Unifies multimodal reasoning

According to @godofprompt, SenseNova U1 unifies vision, language, and reasoning in one model, removing adapters and handoffs for higher fidelity.

Source
2026-05-20
20:07
Gemini 3.5 Flash Debuts with Speed Gains

According to GoogleDeepMind, Gemini 3.5 Flash has launched, signaling faster multimodal inference and lighter deployment for developers.

Source
2026-05-20
17:08
Google Cloud course builds AI agents for media

According to AndrewYNg, DeepLearning.AI launched a course on self-evaluating agents for image and video, combining similarity, LLM judges, and rubrics.

Source
2026-05-20
12:37
Google Gemini unveils agents, pricing, models

According to @godofprompt, Google I O 2026 reveals new Gemini models, personal agents, compute based pricing, and background web monitoring for operators.

Source
2026-05-20
01:05
Gemini 3.5 Flash debuts with multimodal speed

According to @demishassabis, Google details Gemini 3.5 Flash’s fast multimodal performance and developer features on its official blog.

Source
2026-05-20
00:25
Gemini Omni Powers Storytelling Breakthrough

According to GoogleDeepMind, Gemini Omni enables multimodal story creation with text, images, and audio for faster prototyping and richer narratives.

Source
2026-05-19
23:53
ByteDance Lance Beats 7B Models in Benchmarks

According to KyeGomezB, ByteDance’s 3B Lance unifies vision tasks and outperforms 7B models via multi task synergy and MoE pathways.

Source
2026-05-19
21:36
Multimodal Models Test Gym-ID Skills

According to DeepLearning.AI, a new poll challenges multimodal models to identify two gym machines, highlighting progress in visual reasoning.

Source
2026-05-19
21:27
ChatGPT Images 2.0 Drives 1.5B Weekly Creations

According to OpenAI... ChatGPT users now create 1.5B images weekly, revealing fresh commercial design, prototyping, and marketing workflows.

Source
2026-05-19
20:16
Gemini Omni Debuts multimodal editing power

According to DemisHassabis, Gemini Omni builds new scenes from photos, video, and audio, starting with video outputs and expanding to any input or output.

Source
2026-05-19
18:33
Gemini 3.5 Flash earns insane evals

According to sundarpichai, Gemini 3.5 Flash shows strong evals as a workhorse model, signaling efficient multimodal performance for real-world apps.

Source
2026-05-19
17:53
Gemini 3.5 Flash Breakthrough beats 3.1 Pro

According to @OriolVinyalsML, Gemini 3.5 Flash launches with frontier-level intelligence and faster speed, outperforming 3.1 Pro on most benchmarks.

Source
2026-05-19
17:28
Gemini Omni Debuts, powers 'Nano Banana' video

According to The Rundown AI, Demis Hassabis unveiled Gemini Omni at Google I O, a multimodal model touted to create content from any input.

Source
2026-05-19
17:20
Gemini Omni Debuts with create anything power

According to TheRundownAI, Google unveiled Gemini Omni at I O, a new multimodal model that can create from any input, signaling broad product upgrades.

Source
2026-05-19
17:00
Google Gemini unveils IO 2026 AI roadmap

According to @GeminiApp, Google IO 2026 is live, signaling new Gemini upgrades and AI features for developers and businesses.

Source
2026-05-19
16:00
Google Gemini reveals IO Day One updates

According to @GeminiApp, Google IO Day One will unveil new Gemini updates via livestream at 10am PT.

Source
2026-05-15
11:02
MiniMax Hub Unifies AI Video Workflow, Cuts Tool Sprawl

According to @godofprompt, MiniMax Hub centralizes scripting, images, audio, and editing like Claude Code for video creators, streamlining fragmented workflows.

Source
2026-05-14
23:00
Multimodal Pipelines Boost Enterprise Retrieval

According to DeepLearning.AI, most enterprise audio, image, and video data goes unused; learn processing and retrieval in its Building Multimodal Data Pipelines.

Source
2026-05-12
17:26
Gemini Pointer Demo Reveals Interface Breakthrough

According to TheRundownAI, Google DeepMind demoed Gemini in the mouse pointer, streamlining on-screen actions and context for faster AI assistance.

Source