
AutoMV: First Open-Source Full-Song MV Generation Agent Achieves Beat-Synced Storytelling
Want to read in a language you're more familiar with?
The open-source AutoMV system is the first multi-agent framework that can automatically generate full-length, coherent music videos by analyzing a song’s structure, lyrics, and beats, dramatically lowering production cost and time.
Researchers from M-A-P, Beijing University of Posts and Telecommunications, and Nanjing University's NJU-LINK Lab, among others, have jointly introduced AutoMV, the first open-source, training-free multi-agent system capable of generating full-length, narrative-consistent music videos (MVs) lasting several minutes.
Traditional AI video generation models struggle with long-form music due to duration limits, audio-visual misalignment, and poor character consistency. AutoMV overcomes these challenges by simulating a professional production workflow and dividing the task into four stages: music preprocessing, scriptwriting and directing, video generation, and iterative verification.
The system uses tools to separate vocals and accompaniment, extract lyrics, and analyze song structure. Dedicated agents act as "screenwriter" and "director," generating storyboards and visual prompts, while a character library ensures visual consistency throughout the video.
AutoMV's key innovation is the introduction of a verification agent, which automatically checks generated clips for physical plausibility, narrative coherence, and audio-visual alignment. Clips that fail verification are automatically rejected and regenerated. The team also built a new benchmark, M2V, consisting of 30 songs. Evaluation results show that AutoMV significantly outperforms commercial baselines such as OpenArt-story and Revid.ai in character consistency and storytelling, while achieving the highest scores in audio-visual synchronization.
As an open-source, training-free system, AutoMV provides independent musicians and creators with a low-cost tool (estimated cost around $15 USD) for producing professional-style music videos. Generating a complete MV currently takes about 30 minutes, though the team notes room for improvement in complex dance synchronization scenarios.
Source:QbitAi
