Soul App's AI Portrait Animation Study Gains CVPR 2025 Acceptance

PR Newswire

SHANGHAI, March 27, 2025 /PRNewswire/ -- Soul App, a trailblazer in the "Social + AI" domain, has once again demonstrated its commitment to harnessing the power of artificial intelligence in social networking. The company's latest achievement is the acceptance of its research paper on real-time, AI-driven portrait animation at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR), one of the most esteemed conferences in the fields of AI and computer vision.

CVPR is renowned for attracting top-tier research from both industry leaders and top academic institutions worldwide. This year, the conference received a staggering 13,008 submissions, of which only 2,878 were accepted, resulting in an acceptance rate of just 22.1%. This underscores the rigorous selection process and the intense competition in the field. Thus, the recognition from CVPR is a significant milestone for Soul, adding to its growing list of accolades, including the 2024 ACM International Conference on Multimedia (ACM MM) and the top position at the Multimodal Emotion Recognition Challenge (MER24).

The accepted paper, titled "Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation," presents a novel autoregressive framework designed to enhance the efficiency of generating "talking-head" animations. This research aims to meet the growing demand for AI models that can deliver human-like interactions in real time.

What sets the "Teller" framework apart is its unique balance between performance and efficiency. Soul's model utilizes an autoregressive motion generation framework that maintains optimal efficiency without sacrificing the fluidity and authenticity of natural facial and body movements. The paper highlights two key components of this technology:

Facial Motion Latent Generation (FMLG): By leveraging large-scale training data, FMLG enhances the synchronization between audio and visual cues, resulting in more fluid and natural facial expressions in response to speech inputs.

Efficient Temporal Module (ETM): Using a diffusion-based approach, the model accurately captures body dynamics, adding realism to the movements of facial and body muscles, as well as accessories.

During tests, Soul's engineers found that this dual-module system enables AI-generated avatars to exhibit expressions and gestures that feel human in real time. This level of realism significantly enhances user experience in virtual interactions.

Since its inception in 2016, Soul has consistently invested in technological resources to gain an AI-driven edge in social networking. The company's journey began with the self-developed Lingxi Engine, which facilitated user connections based on mutual interests. This was followed by rapid advancements in speech and text-based interactions, as well as 3D virtual human modeling. By 2020, Soul was already leveraging AI-generated content (AIGC) and focusing on intelligent dialogue systems and voice synthesis.

The launch of Soul's proprietary AI model, Soul X, in 2023 marked a major leap forward. This homegrown model introduced features such as multilingual voice calls, speech synthesis, and AI-generated music to the platform. The recent breakthrough with the "Teller" framework is another step toward Soul's goal of integrating speech, vision, and natural language processing (NLP) to create AI-powered digital entities that can interact seamlessly with users in real time. The ultimate aim is to provide not just functional but also emotionally fulfilling companionship.

Soul believes that artificial intelligence should go beyond merely facilitating conversations. The full potential of the technology should be utilized to create experiences that are emotionally enriching for users.

View original content:https://www.prnewswire.com/apac/news-releases/soul-apps-ai-portrait-animation-study-gains-cvpr-2025-acceptance-302412980.html

SOURCE Soul App