Shanghai AI Lab Open-Sources MemVerse: Giving Agents a "Hippocampus" for Multimodal Memory
Want to read in a language you're more familiar with?
Shanghai AI Lab open-sources MemVerse, a bionic multimodal memory framework that equips AI agents with hippocampus-like lifelong memory, slashing response times and boosting cross-modal performance.
Shanghai Artificial Intelligence Laboratory has open-sourced MemVerse, the first universal multimodal memory framework for AI agents, overcoming the "modality isolation and slow response" limitations of traditional systems. For the first time, agents gain cross-modal memory across images, audio, and video, achieving "growable, internalizable, second-level response" lifelong memory.
Conventional AI memory is mostly text-based, relying on mechanical retrieval without understanding spatiotemporal logic or cross-modal semantics. MemVerse employs a three-layer bionic architecture: a central coordinator acts like the "prefrontal cortex" for active scheduling; short-term memory uses sliding windows for conversation coherence; long-term memory builds multimodal knowledge graphs, categorizing core memory (user profiles), episodic memory (event timelines), and semantic memory (abstract concepts), fundamentally mitigating hallucinations.
A pioneering "parameterized distillation" technique periodically fine-tunes high-value long-term knowledge into dedicated small models, boosting retrieval speed over 10x.
Benchmarks show strong results: On ScienceQA, GPT-4o-mini with MemVerse jumped from 76.82 to 85.48; MSR-VTT text-to-video R@1 recall hit 90.4%, far surpassing CLIP (29.7%) and dedicated large model ExCae (67.7%). Memory compression and distillation cut Token usage by 90%, balancing accuracy and cost. MemVerse is now fully open-source:
- Paper: https://arxiv.org/pdf/2512.03627
- Project Page: https://dw2283.github.io/memverse.ai
- GitHub: https://github.com/KnowledgeXLab/MemVerse
Source: Liangziwei
Gallery







